Skip to content

MIBM Performance Optimizations#1157

Open
danieljvickers wants to merge 17 commits intoMFlowCode:masterfrom
danieljvickers:gpu-optimizations
Open

MIBM Performance Optimizations#1157
danieljvickers wants to merge 17 commits intoMFlowCode:masterfrom
danieljvickers:gpu-optimizations

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Feb 17, 2026

User description

Description

Following the refactor of the levelset, there were several performance optimizations left to be made to the code. This PR introduces optimizations that will make multi-particle MIBM code viable. It also expands the upper bound of allowed number of immersed boundaries to 1000. Performance was measured on 1-4 ranks of ACC GPU compute using A100 GPUs.

This PR has extended optimization to STL IBs, which are now fully capable of running on the GPU. An example is given below.

Type of change

  • New feature
  • Refactor
  • Other: Performance Tuning

Testing

All changes pass the IBM section of the test suite on GPUs with the NVHPC compiler. Performance was measured with a case of 1000 particles with viscosity enabled. The particles are all resolved 3D spheres given random non-overlapping positions generated by the following case file:

import json
import argparse

num_cells = [240, 240, 240]
dim = [8., 8., 8.]
num_particles = 1000

# create a stationary fluid
case = {
    "run_time_info": "T",
    "parallel_io": "T",
    "m": num_cells[0]-1,
    "n": num_cells[1]-1,
    "p": num_cells[2]-1,
    "dt": 0.005,
    "t_step_start": 0,
    "t_step_stop": 2,
    "t_step_save": 1,
    "num_patches": 1,
    "model_eqns": 2,
    "alt_soundspeed": "F",
    "num_fluids": 1,
    "mpp_lim": "F",
    "mixture_err": "T",
    "time_stepper": 3,
    "recon_type": 1,
    "weno_eps": 1e-16,
    "riemann_solver": 2,
    "wave_speeds": 1,
    "avg_state": 2,
    "precision": 2,
    "format": 1,
    "prim_vars_wrt": "T",
    "E_wrt": "T",
    "viscous": "T",
    "x_domain%beg": -0.5*dim[0],
    "x_domain%end": 0.5*dim[0],
    "y_domain%beg": -0.5*dim[1],
    "y_domain%end": 0.5*dim[1],
    "z_domain%beg": -0.5*dim[2],
    "z_domain%end": 0.5*dim[2],
    "bc_x%beg": -3,
    "bc_x%end": -3,
    "bc_y%beg": -3,
    "bc_y%end": -3,
    "bc_z%beg": -3,
    "bc_z%end": -3,
    "patch_icpp(1)%geometry": 9,
    "patch_icpp(1)%z_centroid": 0.0,
    "patch_icpp(1)%length_z": dim[2],
    "patch_icpp(1)%y_centroid": 0.0,
    "patch_icpp(1)%length_y": dim[1],
    "patch_icpp(1)%x_centroid": 0.0,
    "patch_icpp(1)%length_x": dim[0],
    "weno_order": 5,
    "patch_icpp(1)%pres": 1.0,
    "patch_icpp(1)%alpha_rho(1)": 1.0,
    "patch_icpp(1)%alpha(1)": 1.0,
    "patch_icpp(1)%vel(1)": 0.0,
    "patch_icpp(1)%vel(2)": 0.0,
    "patch_icpp(1)%vel(3)": 0.0,
    "fluid_pp(1)%gamma": 2.5000000000000004,
    "fluid_pp(1)%pi_inf": 0.0,
    "fluid_pp(1)%Re(1)": 2500000,
}

import random
random.seed(42)

dx = [float(dim[i]) / float(num_cells[i]) for i in range(3)]

#set particle properties
particle_radius_cells = 5 # particle radius in grid cells
particle_cell_spacing = particle_radius_cells*2 + 5 # set the spacing to be double the radius plus 5 to garuntee no image points in other IBs
mpi_cell_spacing = particle_radius_cells + 5 # space safely away from MPI halo regions to prevent out of bounds errors
genreation_bounds = [6., 6., 6.] # generate particles in this box safely away from the boundary
velocity_magnitude = 1.

# convert non-dimnesional grid cell units to the units of the simulation
radius_units = float(particle_radius_cells) * dx[0]
paticle_units_spacing = float(particle_cell_spacing) * dx[0]
paticle_units_spacing_squared = paticle_units_spacing**2
mpi_units_spacing = [float(mpi_cell_spacing) * dx[i] for i in range(3)]

# generate an array of xyz values that garuntee non-overlapping grid cells
particles = []
while len(particles) < num_particles:
  # generate a completely random position
  position = [random.random() for i in range(3)]
  position = [(position[i] - 0.5) * genreation_bounds[i] for i in range(3)]

  # first check if the particle is too close to the MPI halo regions
  valid = True
  # for i in range(3):
  #   valid = valid and abs(position[i]) >= mpi_units_spacing[i]

  # check for the minimum spacing between all particles, exiting once we find an error
  for particle in particles:
    distance_squared = sum([(particle[i] - position[i])**2 for i in range(3)])
    valid = valid and distance_squared >= paticle_units_spacing_squared
    if not valid:
      break

  if valid:
    particles.append(position)
    # print(f"\rProgress: {100.*float(len(particles))/float(num_particles)}%", end="", flush=True)

# print()
# convert out array of positions to valid JSON for the immersed boundary
ib_properties = {"ib": "T", "num_ibs": num_particles,}
for i in range(len(particles)):
  ib_properties[f'patch_ib({i+1})%radius'] = radius_units
  ib_properties[f'patch_ib({i+1})%slip'] = 'F'
  ib_properties[f'patch_ib({i+1})%geometry'] = 8
  ib_properties[f'patch_ib({i+1})%moving_ibm'] = 2
  ib_properties[f'patch_ib({i+1})%mass'] = 1.
  ib_properties[f'patch_ib({i+1})%x_centroid'] = particles[i][0]
  ib_properties[f'patch_ib({i+1})%y_centroid'] = particles[i][1]
  ib_properties[f'patch_ib({i+1})%z_centroid'] = particles[i][2]
  # move the particle away radially to garuntee they never touch during the simulation
  position_mag = (sum([(particles[i][j])**2 for j in range(3)]))**0.5
  for j in range(3):
    ib_properties[f'patch_ib({i+1})%vel({j+1})'] = particles[i][j] * velocity_magnitude / position_mag

print(json.dumps({**case, **ib_properties}))

These optimizations add nearly x1000 performance in the moving IBM propagation and generation code. Prior to these optimizations, this was the result of the benchmark case using the NVIDIA NSight profiler showing 45 seconds to run a single RK substep:

Screenshot 2026-02-16 at 2 57 19 PM

Following these optimizations, the same profile achieves almost 50 ms per RK substep:
Screenshot 2026-02-17 at 10 51 00 AM

Checklist

  • I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

Summary by CodeRabbit

  • New Features

    • GPU-accelerated handling for STL/model-based levelset and immersed-boundary computations (3D-capable)
  • Enhancements

    • Increased maximum patches and immersed boundaries to 1000
    • Improved STL model packing and GPU-friendly data paths for faster distance/normals and inside tests
    • More targeted domain bounding to reduce work in patch/IB routines
    • Added runtime profiling markers to aid performance analysis

CodeAnt-AI Description

Offload immersed-boundary marker, ghost/image, and STL model work to GPU and allow 1000 IBs

What Changed

  • Immersed-boundary marker generation for all patch geometries now updates a GPU-aware marker structure and runs with GPU-parallel loops (circle, rectangle, ellipse, sphere, cuboid, cylinder, airfoil, STL).
  • Ghost-point detection, image-point computation, and interpolation-coefficient computation are performed on the GPU, with reduced work ranges determined by a binary-search bounding-index helper to avoid scanning the whole domain.
  • STL models are read, transformed, packed into flat arrays, uploaded to the device, and used on-device for "point inside model" and nearest-triangle queries in 2D and 3D levelset calculations.
  • Case/config tooling and constants updated to permit up to 1000 immersed boundaries; validator and parameter definitions reflect the new limits.
  • Lightweight profiling ranges (NVTX) added around key IBM stages to aid GPU performance analysis.

Impact

✅ Faster IB setup on GPU
✅ Lower CPU during IB and STL processing
✅ Support for up to 1000 immersed boundaries

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This PR adds GPU-ready data structures and packing for STL models, migrates levelset and IBM computations to GPU-friendly paths, replaces raw IB marker arrays with a typed interface, expands patch limits to 1000, and adds profiling/instrumentation and parallel macro support for atomic-capture endings.

Changes

Cohort / File(s) Summary
Parallel macros
src/common/include/parallel_macros.fpp
Added END_GPU_ATOMIC_CAPTURE() macro to emit OpenACC/OpenMP atomic-capture end directives.
Configuration & limits
src/common/m_constants.fpp, toolchain/mfc/params/definitions.py, toolchain/mfc/case_validator.py, toolchain/mfc/params_tests/test_definitions.py
Increased max patches/IBs from 10 to 1000 and updated related validation and tests (parameter-count expectation adjusted).
Derived types & helpers
src/common/m_derived_types.fpp, src/common/m_helper.fpp
Changed t_model_array%interpolate from logical to integer; added GPU-oriented fields (ntrs, trs_v, trs_n) and annotated f_cross with GPU_ROUTINE.
Model GPU packing & APIs
src/common/m_model.fpp
Added GPU-ready public arrays and accessors (models type with GPU fields, gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, etc.), plus s_pack_model_for_gpu, flat variants f_model_is_inside_flat, f_distance_normals_3d_flat.
Levelset computation
src/simulation/m_compute_levelset.fpp
Switched model-levelset paths to GPU-packed data and flat distance/normals routines; changed local interpolate declarations to integer and adjusted 2D/3D GPU-parallel flows accordingly.
IB patch framework
src/simulation/m_ib_patches.fpp
Replaced ib_markers_sf arrays with typed ib_markers (type(integer_field)), added get_bounding_indices, heavily refactored s_instantiate_STL_models to pack GPU data and constrain GPU loop ranges.
IBM GPU integration & profiling
src/simulation/m_ibm.fpp, src/simulation/m_time_steppers.fpp
Unified 2D/3D IBM flows for GPU, introduced NVTX ranges, atomic counters for ghost-point aggregation, GPU-parallel regions and device updates; added NVTX start/end in time-step propagation.
Tooling / tests
toolchain/mfc/...
Adjusted validation bounds and param definitions/tests to reflect increased IB limits and larger parameter registry size.

Sequence Diagram(s)

sequenceDiagram
    participant CPU as CPU/Host
    participant GPU as GPU Device
    participant Model as Model Data
    participant LevelSet as LevelSet
    participant IBM as IBM

    CPU->>Model: s_instantiate_STL_models()
    Note over Model: Read, transform, compute bounding boxes\npack ntrs, trs_v, trs_n, boundary arrays
    CPU->>GPU: s_pack_model_for_gpu() / upload gpu_* arrays
    CPU->>LevelSet: s_compute_levelset()
    LevelSet->>GPU: f_distance_normals_3d_flat() (parallel)
    GPU-->>LevelSet: distances & normals
    CPU->>CPU: s_apply_ib_patches(ib_markers)  -- uses typed markers
    CPU->>IBM: s_update_mib()/s_find_num_ghost_points()
    IBM->>GPU: GPU kernels (ghost/image points, forces) [NVTX ranges]
    GPU-->>IBM: ghost/force data (atomic counters updated)
    IBM->>CPU: aggregated forces & updated markers
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Review effort 4/5, size:XXL

Suggested reviewers

  • sbryngelson
  • wilfonba

Poem

🐰 I pack the models, snug and tight,

I send them off to GPU light,
With atoms captured, bounds in sight,
Ghost points counted, forces bright—
Hop, compute, and leap to flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'MIBM Performance Optimizations' clearly and concisely summarizes the main focus of the changeset, which involves GPU-based performance improvements to immersed boundary method (MIBM) code.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description comprehensively addresses all required template sections including motivation, type of change, testing details with benchmark data, and GPU-specific validation confirmations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 57.08812% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.24%. Comparing base (c5493c1) to head (bc972ca).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_ib_patches.fpp 63.12% 56 Missing and 3 partials ⚠️
src/simulation/m_ibm.fpp 50.00% 23 Missing and 13 partials ⚠️
src/common/m_model.fpp 37.03% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1157      +/-   ##
==========================================
+ Coverage   44.07%   44.24%   +0.17%     
==========================================
  Files          70       70              
  Lines       20431    20505      +74     
  Branches     1974     1950      -24     
==========================================
+ Hits         9004     9073      +69     
- Misses      10291    10306      +15     
+ Partials     1136     1126      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danieljvickers danieljvickers marked this pull request as ready for review February 19, 2026 18:39
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 12 files

Confidence score: 2/5

  • High risk due to concrete breakages: stl_bounding_boxes re-allocation in src/simulation/m_ib_patches.fpp can crash at runtime with multiple STL patches, and src/simulation/m_compute_levelset.fpp changes make if (interpolate) invalid in Fortran 3D path.
  • An unconditional interpolate = .false. in src/simulation/m_ib_patches.fpp disables STL interpolation, making the intended logic dead and likely altering results.
  • Pay close attention to src/simulation/m_ib_patches.fpp and src/simulation/m_compute_levelset.fpp - crash/compile risks around interpolation and allocation.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="toolchain/mfc/case_validator.py">

<violation number="1" location="toolchain/mfc/case_validator.py:687">
P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</violation>
</file>

<file name="src/simulation/m_compute_levelset.fpp">

<violation number="1" location="src/simulation/m_compute_levelset.fpp:647">
P1: Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</violation>
</file>

<file name="src/simulation/m_time_steppers.fpp">

<violation number="1" location="src/simulation/m_time_steppers.fpp:805">
P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</violation>
</file>

<file name="src/common/m_model.fpp">

<violation number="1" location="src/common/m_model.fpp:550">
P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</violation>
</file>

<file name="src/simulation/m_ib_patches.fpp">

<violation number="1" location="src/simulation/m_ib_patches.fpp:1026">
P1: Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</violation>

<violation number="2" location="src/simulation/m_ib_patches.fpp:1069">
P1: Runtime crash with multiple STL models: `stl_bounding_boxes` is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use `num_ibs` for the first dimension.</violation>
</file>

<file name="src/simulation/m_ibm.fpp">

<violation number="1" location="src/simulation/m_ibm.fpp:114">
P3: `s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
logical :: interpolate
integer :: interpolate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Changing interpolate to INTEGER breaks the if (interpolate) then check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_compute_levelset.fpp, line 647:

<comment>Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</comment>

<file context>
@@ -655,7 +644,7 @@ contains
 
         integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
-        logical :: interpolate
+        integer :: interpolate
         real(wp), dimension(1:3) :: center, xyz_local
         real(wp) :: normals(1:3) !< Boundary normal buffer
</file context>

else
call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
end if
interpolate = .false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Debugging override left in: interpolate = .false. unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ib_patches.fpp, line 1026:

<comment>Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</comment>

<file context>
@@ -865,83 +806,368 @@ contains
+                else
+                    call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+                end if
+                interpolate = .false.
+
+                ! Show the number of edges and boundary edges in 2D STL models
</file context>

! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
! cast sample rays in all directions
ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/common/m_model.fpp, line 550:

<comment>Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</comment>

<file context>
@@ -493,58 +526,116 @@ contains
+                ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
+                ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
+                ! cast sample rays in all directions
+                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+            end do
             ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
</file context>
Suggested change
ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

self.prohibit(ib and n <= 0,
"Immersed Boundaries do not work in 1D (requires n > 0)")
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At toolchain/mfc/case_validator.py, line 687:

<comment>Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</comment>

<file context>
@@ -684,7 +684,7 @@ def check_ibm(self):
         self.prohibit(ib and n <= 0,
                      "Immersed Boundaries do not work in 1D (requires n > 0)")
-        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
                      "num_ibs must be between 1 and num_patches_max (10)")
         self.prohibit(not ib and num_ibs > 0,
</file context>

integer :: i
logical :: forces_computed

call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_time_steppers.fpp, line 805:

<comment>Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</comment>

<file context>
@@ -802,6 +802,8 @@ contains
         integer :: i
         logical :: forces_computed
 
+        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
+
         forces_computed = .false.
</file context>
Suggested change
call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

@@ -28,6 +28,8 @@ module m_ibm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: s_populate_ib_buffers is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in s_ibm_setup and s_update_mib. Consider removing the subroutine to avoid confusion for future contributors.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ibm.fpp, line 114:

<comment>`s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</comment>

<file context>
@@ -96,19 +100,18 @@ contains
-        call s_apply_ib_patches(ib_markers%sf(0:m, 0:n, 0:p))
         $:GPU_UPDATE(device='[ib_markers%sf]')
-        call s_populate_ib_buffers()
+        call s_apply_ib_patches(ib_markers)
         $:GPU_UPDATE(host='[ib_markers%sf]')
         do i = 1, num_ibs
</file context>

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/simulation/m_time_steppers.fpp (1)

805-855: ⚠️ Potential issue | 🟡 Minor

Fix NVTX range label typo for consistent profiling.

The label reads “IMERSED” instead of “IMMERSED”, which can break grouping in profiling tools.

✏️ Proposed fix
-        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
+        call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_time_steppers.fpp` around lines 805 - 855, The NVTX range
label passed to nvtxStartRange in the MTimeSteppers routine is misspelled
("PROPAGATE-IMERSED-BOUNDARIES"); update the string to the correct label
"PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools group this region correctly
(look for the nvtxStartRange(...) call in the do-loop that updates patch_ib and
ensure nvtxEndRange remains paired).
src/simulation/m_compute_levelset.fpp (1)

641-709: ⚠️ Potential issue | 🔴 Critical

Fix integer interpolate used as logical in 3D model path (won't compile).

Line 691 uses if (interpolate) then but interpolate is declared as integer (line 647). This is not valid Fortran and will fail compilation. The 2D branch already correctly uses if (interpolate == 1) then (line 704).

Proposed fix
             ! Get the shortest distance between the cell center and the interpolated model boundary
-            if (interpolate) then
+            if (interpolate == 1) then
                 gp%levelset = f_interpolated_distance(gpu_interpolated_boundary_v(:, :, patch_id), total_vertices, xyz_local)
             else
                 gp%levelset = distance
             end if
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_compute_levelset.fpp` around lines 641 - 709, The integer
variable interpolate in subroutine s_model_levelset is being used as a logical
in the 3D branch (if (interpolate) then) which will not compile; update that
conditional to explicitly compare the integer (e.g., if (interpolate == 1) then)
or change the variable to a logical and adjust all uses accordingly so that the
3D path (the block that currently calls f_interpolated_distance or assigns
gp%levelset = distance) uses a valid boolean check on interpolate.
src/simulation/m_ibm.fpp (3)

589-676: ⚠️ Potential issue | 🟠 Major

Add patch_id to the GPU private list in s_find_ghost_points.

patch_id is assigned in every iteration (lines 628, 666) and used only within that iteration. Without being declared private, multiple GPU threads will access a shared patch_id variable, causing race conditions when reading from patch_ib(patch_id)%....

Proposed fix
-        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
+        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx,patch_id]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 589 - 676, The variable patch_id in
s_find_ghost_points is written per-iteration but not listed as private in the
$:GPU_PARALLEL_LOOP, causing concurrent threads to race when reading
patch_ib(patch_id); update the GPU pragma for s_find_ghost_points to include
patch_id in the private list (alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each
thread gets its own patch_id before using patch_ib(patch_id)%... .

441-534: ⚠️ Potential issue | 🟠 Major

Fix pointer association and silent out-of-bounds risk in GPU kernel image-point search.

Inside the GPU-parallel loop, s_cc => x_cc/y_cc/z_cc is pointer association within device code, and the bounds safety check (lines 509–521) is compiled out for GPU builds (#if !defined(MFC_OpenACC)). This means the do while loop can walk index beyond [-buff_size, bound] silently, storing invalid values in ip_grid without error or warning on GPU.

Recommended fixes:

  • Add explicit bounds clamping inside the loop: if index reaches the boundary, exit and clamp to a valid index.
  • Alternatively, replace the linear search with a binary search over monotone s_cc for both safety and performance.
Current unsafe loop (lines 506–524)
do while ((temp_loc < s_cc(index) &
           .or. temp_loc > s_cc(index + 1)))
    index = index + dir
`#if` !defined(MFC_OpenACC) && !defined(MFC_OpenMP)
    if (index < -buff_size .or. index > bound) then
        ! error handling (skipped on GPU)
        error stop "Ghost Point and Image Point on Different Processors"
    end if
`#endif`
end do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 441 - 534, s_compute_image_points
contains a GPU-parallel do-while that advances index using pointer s_cc and can
run out-of-bounds on device builds because the host-only error check is compiled
out; fix by adding an in-device-safe boundary clamp or replace the linear scan
with a binary search over s_cc. Specifically, inside the loop that updates index
(references: s_cc, index, dir, temp_loc, bound, buff_size) add an explicit check
that if index reaches the lower/upper limits (-buff_size or bound) you break the
loop and clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.

967-1120: ⚠️ Potential issue | 🟠 Major

Add GPU_UPDATE(device=...) after MPI reductions to maintain self-contained GPU synchronization.

The MPI reductions at lines 1108–1109 occur on the host, followed by host-side writes to patch_ib(i)%force and patch_ib(i)%torque. While the caller (s_update_mib at line 940 in m_time_steppers.fpp) does eventually sync patch_ib to device, making s_compute_ib_forces self-contained by adding GPU_UPDATE(device='[patch_ib(1:num_ibs)]') after the final loop ensures coherence is handled within the function and aligns with the guideline: "Wrap MPI calls with GPU_UPDATE to ensure GPU-host synchronization before and after MPI communication."

Suggested fix
         do i = 1, num_ibs
             patch_ib(i)%force(:) = forces(i, :)
             patch_ib(i)%torque(:) = matmul(patch_ib(i)%rotation_matrix_inverse, torques(i, :))
         end do
+
+        ! Sync updated patch_ib to device for GPU kernels
+        $:GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 967 - 1120, The MPI reductions in
s_compute_ib_forces (calls to s_mpi_allreduce_vectors_sum for forces and
torques) are host-side and must be bracketed by a GPU update to ensure
device/host coherence; insert GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
immediately after the two s_mpi_allreduce_vectors_sum calls and before any
writes to patch_ib(i)%force or patch_ib(i)%torque so the routine handles
synchronization itself.
src/common/m_model.fpp (1)

498-576: ⚠️ Potential issue | 🟠 Major

Ray direction generation is incorrect and risks biased/degenerate rays.

Ray directions in both f_model_is_inside (line 550) and f_model_is_inside_flat (line 608) are computed as point(k) + f_model_random_number(rand_seed) - 0.5_wp. This couples the direction to absolute coordinates, causing heavy bias for large coordinate values. Additionally, both functions normalize without guarding against zero-length vectors, risking division by zero.

Ray directions should be independent of the point's absolute location—only the ray origin should depend on point. The fix removes the point(k) term and adds max(..., sgm_eps) guards on normalization (consistent with existing patterns in the codebase):

Proposed changes
Line 550 (f_model_is_inside):
-                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+                ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

Line 552:
-            ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
+            ray_dirs(i, :) = ray_dirs(i, :)/max(sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))), sgm_eps)

Line 608 (f_model_is_inside_flat):
-                dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+                dir(k) = f_model_random_number(rand_seed) - 0.5_wp

Lines 610–611:
-            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
-            dir(:) = dir(:)/dir_mag
+            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
+            dir(:) = dir(:)/max(dir_mag, sgm_eps)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 498 - 576, The ray direction generation
in f_model_is_inside (and similarly f_model_is_inside_flat) incorrectly adds
point(k) to the random direction (coupling direction to absolute coordinates)
and normalizes without guarding against zero length; fix by computing
ray_dirs(i,k) = f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and
when normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).
src/simulation/m_ib_patches.fpp (2)

169-323: ⚠️ Potential issue | 🟠 Major

Add loop-local scalars xa, yc, and dycdxc to the private list in GPU_PARALLEL_LOOP.

Lines 265–266: These three scalars are assigned per-iteration inside the GPU-parallel region (lines 274–280) but are not declared private, creating a race condition where multiple GPU threads write to the same global variables. Update the private clause to private='[i,j,xy_local,k,f,xa,yc,dycdxc]'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 169 - 323, The
GPU_PARALLEL_LOOP in subroutine s_ib_airfoil currently omits loop-local scalars
xa, yc, and dycdxc from its private list, causing a race condition; update the
GPU_PARALLEL_LOOP directive's private clause to include xa, yc, and dycdxc (so
it becomes private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets
its own copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc
and writes into ib_markers%sf.

28-112: ⚠️ Potential issue | 🟠 Major

Public API change: s_apply_ib_patches now takes type(integer_field) — all callers verified and updated correctly.

s_apply_ib_patches changed its argument type (line 60). Both call sites in src/simulation/m_ibm.fpp (lines 114, 945) correctly pass type(integer_field) as argument. This breaking signature change affects the public API and is appropriately flagged for review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 28 - 112, s_apply_ib_patches
now expects an argument of type(integer_field) (the signature changed), so
update all call sites and any interface/wrapper that referenced the old
signature; specifically ensure the calls in m_ibm.fpp (the two places that were
already updated) match the new prototype and remove/adjust any legacy
declarations or interface blocks that declared s_apply_ib_patches with the old
argument type, and update any documentation/comments that describe
s_apply_ib_patches to reflect it takes a type(integer_field) argument.
🧹 Nitpick comments (2)
src/common/m_constants.fpp (1)

25-25: Run all three targets after updating shared constants.

This change is in src/common/, so please ensure pre_process, simulation, and post_process targets are exercised to catch any cross-target regressions.
Based on learnings: “Changes to src/common/ affect all three executables (pre_process, simulation, post_process); test all three targets.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_constants.fpp` at line 25, You changed the shared constant
num_patches_max in src/common/m_constants.fpp which can impact all executables;
rebuild and run the pre_process, simulation, and post_process targets (or their
test suites) to exercise cross-target behavior and catch regressions —
specifically rebuild the three binaries that consume num_patches_max
(pre_process, simulation, post_process), run their full test inputs or
integration runs, and verify no runtime assertions, array bounds, or config
mismatches occur after the change.
src/simulation/m_ibm.fpp (1)

536-586: Use reduction instead of atomic operations to improve scaling for grid-cell classification.

num_gps_local and num_inner_gps_local are currently incremented via GPU_ATOMIC(update) for every classified cell, causing high contention on large grids. Since these are simple sums, use GPU_PARALLEL_LOOP reductions instead:

$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]', &
    & reduction='[[num_gps_local, num_inner_gps_local]]', reductionOp='[+, +]')

Replace the copy directive and remove both GPU_ATOMIC calls. Gang/vector reductions avoid per-thread atomics and scale significantly better.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 536 - 586, The loop in subroutine
s_find_num_ghost_points uses per-cell GPU_ATOMIC updates on num_gps_local and
num_inner_gps_local causing contention; replace the copy-based GPU_PARALLEL_LOOP
with a reduction-based directive so the counts are accumulated via GPU
reductions instead of atomics (use GPU_PARALLEL_LOOP with collapse=3,
private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]' and a
reduction clause for [num_gps_local, num_inner_gps_local] with '+'), remove the
two GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/common/m_derived_types.fpp`:
- Around line 186-199: The t_model_array derived type declares scalar fields
boundary_edge_count, total_vertices, interpolate and ntrs without defaults which
can leave garbage values; add default initializers for these scalars in the type
definition (set boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0)
so all instances start deterministic before any assignment, and then run and
verify pre_process, post_process and simulation builds/tests to ensure the
initialization change doesn't break their workflows; update any constructors or
code that relied on uninitialized behavior if necessary.

In `@src/common/m_model.fpp`:
- Around line 1209-1267: The subroutine f_distance_normals_3D_flat can index
trs_n(:, tri_idx, pid) with tri_idx==0 when ntrs <= 0; add a guard at the top of
f_distance_normals_3D_flat that checks if ntrs <= 0 and in that case set
distance to a safe value (e.g. a large positive sentinel or 0._wp as
appropriate) and set normals to a defined default (e.g. zero vector) and return
immediately to avoid using tri_idx/uninitialized indices; ensure you reference
and update tri_idx, distance and normals in this early-return path so no
out-of-bounds access to trs_n occurs.
- Around line 20-42: Add a module finalizer in m_model.fpp (e.g., subroutine
s_finalize_model or a MODULE PROCEDURE bound to final) that checks
ALLOCATED(...) and calls DEALLOCATE(...) for all public allocatable arrays
gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes
and models to ensure GPU/host memory is freed during simulation cleanup; export
or document the finalizer so the cleanup path in your driver or cleanup routine
calls it. Also remove the unnecessary "use m_model" import from
src/pre_process/m_icpp_patches.fpp (or the specific pre_process file mentioned)
since it does not reference the gpu_* arrays or models, eliminating the unwanted
coupling.
- Around line 1402-1415: s_pack_model_for_gpu currently unconditionally
allocates ma%trs_v and ma%trs_n; make it re-entrant by checking existing
allocations before allocating: set ma%ntrs = ma%model%ntrs, then for ma%trs_v
and ma%trs_n use allocated(...) to detect prior allocation and only allocate if
not allocated or if the current allocation dimensions don't match ma%ntrs (in
that case deallocate then allocate with the correct bounds); ensure you copy
into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation checks complete.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 954-1169: s_instantiate_STL_models has four fixes: allocate
stl_bounding_boxes once before the patch loop with shape (num_ibs,1:3,1:3)
instead of calling allocate(stl_bounding_boxes(patch_id,...)) inside the loop;
ensure total_vertices is initialized or only copied into
models(patch_id)%total_vertices when interpolate is true (do not propagate
uninitialized total_vertices when interpolation is skipped); pass spacing
magnitudes to the interpolation routines by using (/dx_local, dy_local,
dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the
unconditional interpolate = .false. that currently masks interpolation logic);
and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v,
gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each
allocate to allow repeated calls to s_instantiate_STL_models.
- Around line 828-953: s_update_mib’s restart path must refresh the
device-resident model arrays before generating markers: ensure
s_instantiate_STL_models() (or an equivalent GPU_ENTER_DATA/refresh routine for
module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n) is called or re-entered before
s_apply_ib_patches() in the restart branch so device data reflects any changes
from s_update_ib_rotation_matrix(); in s_ib_model and s_ib_3d_model, add
explicit device presence to the GPU_PARALLEL_LOOP clauses by including
present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' (in addition to existing copyin) to
avoid relying on implicit persistence; and narrow the 2D scan in s_ib_model by
reusing the bounding-index logic from s_ib_3d_model (compute corner_distance
from stl_bounding_boxes and call get_bounding_indices to set i/j limits) to
avoid scanning the full grid for sparse STL patches.

In `@toolchain/mfc/case_validator.py`:
- Around line 687-688: The error message passed to self.prohibit for the num_ibs
check is stale (mentions 10) while the actual upper bound is 1000; update the
message to reflect the real max and, to prevent future drift, construct the
message using the authoritative constraint value (e.g. the CONSTRAINTS constant
or the class attribute used for bounds) rather than a hardcoded literal—locate
the self.prohibit call that checks num_ibs (the line with self.prohibit(ib and
(num_ibs <= 0 or num_ibs > 1000), ...)) and interpolate/format the upper-bound
from CONSTRAINTS (or self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as
appropriate) so the message reads "num_ibs must be between 1 and <max>".

---

Outside diff comments:
In `@src/common/m_model.fpp`:
- Around line 498-576: The ray direction generation in f_model_is_inside (and
similarly f_model_is_inside_flat) incorrectly adds point(k) to the random
direction (coupling direction to absolute coordinates) and normalizes without
guarding against zero length; fix by computing ray_dirs(i,k) =
f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and when
normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).

In `@src/simulation/m_compute_levelset.fpp`:
- Around line 641-709: The integer variable interpolate in subroutine
s_model_levelset is being used as a logical in the 3D branch (if (interpolate)
then) which will not compile; update that conditional to explicitly compare the
integer (e.g., if (interpolate == 1) then) or change the variable to a logical
and adjust all uses accordingly so that the 3D path (the block that currently
calls f_interpolated_distance or assigns gp%levelset = distance) uses a valid
boolean check on interpolate.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 169-323: The GPU_PARALLEL_LOOP in subroutine s_ib_airfoil
currently omits loop-local scalars xa, yc, and dycdxc from its private list,
causing a race condition; update the GPU_PARALLEL_LOOP directive's private
clause to include xa, yc, and dycdxc (so it becomes
private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets its own
copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc and
writes into ib_markers%sf.
- Around line 28-112: s_apply_ib_patches now expects an argument of
type(integer_field) (the signature changed), so update all call sites and any
interface/wrapper that referenced the old signature; specifically ensure the
calls in m_ibm.fpp (the two places that were already updated) match the new
prototype and remove/adjust any legacy declarations or interface blocks that
declared s_apply_ib_patches with the old argument type, and update any
documentation/comments that describe s_apply_ib_patches to reflect it takes a
type(integer_field) argument.

In `@src/simulation/m_ibm.fpp`:
- Around line 589-676: The variable patch_id in s_find_ghost_points is written
per-iteration but not listed as private in the $:GPU_PARALLEL_LOOP, causing
concurrent threads to race when reading patch_ib(patch_id); update the GPU
pragma for s_find_ghost_points to include patch_id in the private list
(alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each thread gets its own patch_id
before using patch_ib(patch_id)%... .
- Around line 441-534: s_compute_image_points contains a GPU-parallel do-while
that advances index using pointer s_cc and can run out-of-bounds on device
builds because the host-only error check is compiled out; fix by adding an
in-device-safe boundary clamp or replace the linear scan with a binary search
over s_cc. Specifically, inside the loop that updates index (references: s_cc,
index, dir, temp_loc, bound, buff_size) add an explicit check that if index
reaches the lower/upper limits (-buff_size or bound) you break the loop and
clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.
- Around line 967-1120: The MPI reductions in s_compute_ib_forces (calls to
s_mpi_allreduce_vectors_sum for forces and torques) are host-side and must be
bracketed by a GPU update to ensure device/host coherence; insert
GPU_UPDATE(device='[patch_ib(1:num_ibs)]') immediately after the two
s_mpi_allreduce_vectors_sum calls and before any writes to patch_ib(i)%force or
patch_ib(i)%torque so the routine handles synchronization itself.

In `@src/simulation/m_time_steppers.fpp`:
- Around line 805-855: The NVTX range label passed to nvtxStartRange in the
MTimeSteppers routine is misspelled ("PROPAGATE-IMERSED-BOUNDARIES"); update the
string to the correct label "PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools
group this region correctly (look for the nvtxStartRange(...) call in the
do-loop that updates patch_ib and ensure nvtxEndRange remains paired).

---

Nitpick comments:
In `@src/common/m_constants.fpp`:
- Line 25: You changed the shared constant num_patches_max in
src/common/m_constants.fpp which can impact all executables; rebuild and run the
pre_process, simulation, and post_process targets (or their test suites) to
exercise cross-target behavior and catch regressions — specifically rebuild the
three binaries that consume num_patches_max (pre_process, simulation,
post_process), run their full test inputs or integration runs, and verify no
runtime assertions, array bounds, or config mismatches occur after the change.

In `@src/simulation/m_ibm.fpp`:
- Around line 536-586: The loop in subroutine s_find_num_ghost_points uses
per-cell GPU_ATOMIC updates on num_gps_local and num_inner_gps_local causing
contention; replace the copy-based GPU_PARALLEL_LOOP with a reduction-based
directive so the counts are accumulated via GPU reductions instead of atomics
(use GPU_PARALLEL_LOOP with collapse=3, private='[i,j,k,ii,jj,kk,is_gp]',
firstprivate='[gp_layers,gp_layers_z]' and a reduction clause for
[num_gps_local, num_inner_gps_local] with '+'), remove the two
GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.

Comment on lines 186 to 199
type :: t_model_array
! Original CPU-side fields (unchanged)
type(t_model), allocatable :: model
real(wp), allocatable, dimension(:, :, :) :: boundary_v
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
integer :: boundary_edge_count
integer :: total_vertices
logical :: interpolate
integer :: interpolate

! GPU-friendly flattened arrays
integer :: ntrs ! copy of model%ntrs
real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices
real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals
end type t_model_array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the current state of the file at the specified lines
cat -n src/common/m_derived_types.fpp | sed -n '180,210p'

Repository: MFlowCode/MFC

Length of output: 1607


🏁 Script executed:

# Search for all uses of %interpolate across the codebase
rg -nP '%interpolate\b' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 344


🏁 Script executed:

# Look for suspicious logical patterns with %interpolate
rg -nP '%interpolate\b.*(\.(true|false)|\.not\.)' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

# Check if there are any conditional patterns that may need review
rg -nP 'if\s*\(.*%interpolate' src --type-add 'fpp:*.fpp' --type 'fpp' -A 2

Repository: MFlowCode/MFC

Length of output: 854


🏁 Script executed:

# Check for assignments to %interpolate in the codebase
rg -nP '%interpolate\s*=' src --type-add 'fpp:*.fpp' --type 'fpp' -B 1 -A 1

Repository: MFlowCode/MFC

Length of output: 429


t_model_array scalars must be initialized to prevent garbage values reaching GPU or uninitialized field reads.

All uses of %interpolate correctly employ integer semantics (0/1, no logical operators); however, the type definition still lacks default initialization for boundary_edge_count, total_vertices, interpolate, and ntrs. Uninitialized scalars can leak garbage into GPU staging arrays or cause undefined behavior if any code path reads these fields before assignment.

Apply default initialization
         integer :: boundary_edge_count = 0
         integer :: total_vertices = 0
         integer :: interpolate = 0

         integer :: ntrs = 0  ! copy of model%ntrs

Additionally, verify that changes do not break pre_process and post_process executables, not just simulation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type :: t_model_array
! Original CPU-side fields (unchanged)
type(t_model), allocatable :: model
real(wp), allocatable, dimension(:, :, :) :: boundary_v
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
integer :: boundary_edge_count
integer :: total_vertices
logical :: interpolate
integer :: interpolate
! GPU-friendly flattened arrays
integer :: ntrs ! copy of model%ntrs
real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices
real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals
end type t_model_array
type :: t_model_array
! Original CPU-side fields (unchanged)
type(t_model), allocatable :: model
real(wp), allocatable, dimension(:, :, :) :: boundary_v
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
integer :: boundary_edge_count = 0
integer :: total_vertices = 0
integer :: interpolate = 0
! GPU-friendly flattened arrays
integer :: ntrs = 0 ! copy of model%ntrs
real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices
real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals
end type t_model_array
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_derived_types.fpp` around lines 186 - 199, The t_model_array
derived type declares scalar fields boundary_edge_count, total_vertices,
interpolate and ntrs without defaults which can leave garbage values; add
default initializers for these scalars in the type definition (set
boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0) so all instances
start deterministic before any assignment, and then run and verify pre_process,
post_process and simulation builds/tests to ensure the initialization change
doesn't break their workflows; update any constructors or code that relied on
uninitialized behavior if necessary.

Comment on lines +20 to 42
public :: f_model_read, s_model_write, s_model_free, f_model_is_inside, models, gpu_ntrs, &
gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, &
gpu_total_vertices, stl_bounding_boxes

! Subroutines for STL immersed boundaries
public :: f_check_boundary, f_register_edge, f_check_interpolation_2D, &
f_check_interpolation_3D, f_interpolate_2D, f_interpolate_3D, &
f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area
f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area, s_pack_model_for_gpu, &
f_model_is_inside_flat, f_distance_normals_3d_flat

!! array of STL models that can be allocated and then used in IB marker and levelset compute
type(t_model_array), allocatable, target :: models(:)
!! GPU-friendly flat arrays for STL model data
integer, allocatable :: gpu_ntrs(:)
real(wp), allocatable, dimension(:, :, :, :) :: gpu_trs_v
real(wp), allocatable, dimension(:, :, :) :: gpu_trs_n
real(wp), allocatable, dimension(:, :, :, :) :: gpu_boundary_v
real(wp), allocatable, dimension(:, :, :) :: gpu_interpolated_boundary_v
integer, allocatable :: gpu_interpolate(:)
integer, allocatable :: gpu_boundary_edge_count(:)
integer, allocatable :: gpu_total_vertices(:)
real(wp), allocatable :: stl_bounding_boxes(:, :, :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find allocation and deallocation sites for the new module variables
echo "=== Allocation/Deallocation patterns ==="
rg -nP --type fpp '(allocate|deallocate)\s*\(' src/common/m_model.fpp | head -30

echo -e "\n=== GPU array allocations in entire src/common ==="
rg -nP --type fpp 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|gpu_interpolate|gpu_boundary_edge_count|gpu_total_vertices|stl_bounding_boxes' src/common/ | grep -i allocate | head -20

echo -e "\n=== Where m_model is imported ==="
rg -n 'use.*m_model' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 622


🏁 Script executed:

#!/bin/bash
# Check the finalization routine referenced in the public interface
echo "=== s_model_free subroutine definition ==="
rg -A 50 'subroutine\s+s_model_free' src/common/m_model.fpp | head -60

echo -e "\n=== s_model_write subroutine definition ==="
rg -A 30 'subroutine\s+s_model_write' src/common/m_model.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 2632


🏁 Script executed:

#!/bin/bash
# Check if pre_process or post_process reference the new public variables or m_model
echo "=== pre_process usage of m_model ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/pre_process/ 2>/dev/null | head -10

echo -e "\n=== post_process usage of m_model ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/post_process/ 2>/dev/null | head -10

echo -e "\n=== Check if pre_process or post_process import m_model ==="
rg -n 'use.*m_model' src/pre_process/ src/post_process/ 2>/dev/null

Repository: MFlowCode/MFC

Length of output: 518


🏁 Script executed:

#!/bin/bash
# Get the full m_model.fpp file to understand context
wc -l src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 84


🏁 Script executed:

#!/bin/bash
# Read the module to find allocation patterns
# Focus on where the public arrays are allocated
sed -n '20,100p' src/common/m_model.fpp | cat -n

Repository: MFlowCode/MFC

Length of output: 3661


🏁 Script executed:

#!/bin/bash
# Search for allocation patterns - try broader search
rg -n 'allocate' src/common/m_model.fpp | head -30

Repository: MFlowCode/MFC

Length of output: 786


🏁 Script executed:

#!/bin/bash
# Look for initialization subroutines that might allocate these arrays
rg -n 'subroutine.*init\|subroutine.*allocate\|subroutine.*setup' src/common/m_model.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for where models(:) is allocated
rg -B 3 -A 3 'models\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 430


🏁 Script executed:

#!/bin/bash
# Look for gpu_ntrs allocation pattern
rg -B 3 -A 3 'gpu_ntrs\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 490


🏁 Script executed:

#!/bin/bash
# Check how these are used in simulation to understand lifecycle
echo "=== Usage in m_ibm.fpp ==="
rg -B 2 -A 2 'gpu_ntrs\|gpu_trs_v\|gpu_boundary_v\|models\(' src/simulation/m_ibm.fpp | head -40

echo -e "\n=== Usage in pre_process/m_icpp_patches.fpp ==="
rg -B 2 -A 2 'models\(' src/pre_process/m_icpp_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Search for allocation of the MODULE-LEVEL arrays in calling modules
echo "=== Search for allocate of gpu_ntrs in entire codebase ==="
rg 'allocate.*gpu_ntrs\|gpu_ntrs.*allocate' src/ | head -20

echo -e "\n=== Search for s_pack_model_for_gpu ==="
rg -n 's_pack_model_for_gpu' src/ | head -20

echo -e "\n=== Look for where gpu arrays are assigned/populated ==="
rg -n 'gpu_ntrs\s*=' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 591


🏁 Script executed:

#!/bin/bash
# Search for deallocate of module-level arrays
echo "=== Deallocate of module-level gpu arrays ==="
rg 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*models\(' src/ | head -20

echo -e "\n=== Look for finalization in m_ibm.fpp ==="
rg -n 'finalize\|cleanup\|deallocate' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 146


🏁 Script executed:

#!/bin/bash
# Look at s_pack_model_for_gpu implementation
echo "=== s_pack_model_for_gpu subroutine ==="
rg -A 80 'subroutine\s+s_pack_model_for_gpu' src/common/m_model.fpp | head -100

Repository: MFlowCode/MFC

Length of output: 519


🏁 Script executed:

#!/bin/bash
# Final verification: confirm module-level arrays have no lifecycle
echo "=== Complete scan for any allocation pattern ==="
rg 'allocate\s*\(' src/common/m_model.fpp src/simulation/m_ib_patches.fpp src/simulation/m_ibm.fpp | grep -E 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\s*\(' || echo "NO ALLOCATIONS FOUND for public module arrays"

echo -e "\n=== Confirm these are truly module-level (public scope) ==="
rg -B 1 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 763


🏁 Script executed:

#!/bin/bash
# Search for deallocation of these specific arrays
echo "=== Deallocate patterns for public module arrays ==="
rg -n 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*gpu_boundary_v\|deallocate.*models' src/ | head -20

echo -e "\n=== Broader search for deallocate in m_ib_patches.fpp ==="
rg -n 'deallocate' src/simulation/m_ib_patches.fpp | head -20

echo -e "\n=== Context around gpu array allocations in m_ib_patches.fpp ==="
rg -B 5 -A 10 'allocate.*gpu_ntrs' src/simulation/m_ib_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 890


🏁 Script executed:

#!/bin/bash
# Check what pre_process actually uses from m_model
echo "=== What does pre_process use from m_model? ==="
sed -n '1,50p' src/pre_process/m_icpp_patches.fpp | grep -A 15 'use m_model'

echo -e "\n=== Any direct usage of gpu_* or models arrays in pre_process? ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\(' src/pre_process/m_icpp_patches.fpp | head -10

Repository: MFlowCode/MFC

Length of output: 553


Add deallocate() calls for all public module arrays in a finalization routine, and remove unnecessary m_model import from pre_process.

The public module arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices) are allocated in m_ib_patches.fpp but never deallocated, causing GPU memory leaks. Create a finalization subroutine in m_model.fpp to deallocate these arrays and call it during simulation cleanup. Additionally, src/pre_process/m_icpp_patches.fpp imports m_model but does not use any of the gpu or models arrays—remove this unnecessary coupling per src/common/ guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 20 - 42, Add a module finalizer in
m_model.fpp (e.g., subroutine s_finalize_model or a MODULE PROCEDURE bound to
final) that checks ALLOCATED(...) and calls DEALLOCATE(...) for all public
allocatable arrays gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v,
gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count,
gpu_total_vertices, stl_bounding_boxes and models to ensure GPU/host memory is
freed during simulation cleanup; export or document the finalizer so the cleanup
path in your driver or cleanup routine calls it. Also remove the unnecessary
"use m_model" import from src/pre_process/m_icpp_patches.fpp (or the specific
pre_process file mentioned) since it does not reference the gpu_* arrays or
models, eliminating the unwanted coupling.

Comment on lines +1209 to +1267
subroutine f_distance_normals_3D_flat(ntrs, trs_v, trs_n, pid, point, normals, distance)

$:GPU_ROUTINE(parallelism='[seq]')

integer, intent(in) :: ntrs
real(wp), dimension(:, :, :, :), intent(in) :: trs_v
real(wp), dimension(:, :, :), intent(in) :: trs_n
integer, intent(in) :: pid
real(wp), dimension(1:3), intent(in) :: point
real(wp), dimension(1:3), intent(out) :: normals
real(wp), intent(out) :: distance

real(wp), dimension(1:3, 1:3) :: tri
real(wp) :: dist_min, dist_t_min
real(wp) :: dist_min_normal, dist_buffer_normal
real(wp), dimension(1:3) :: midp
real(wp), dimension(1:3) :: dist_buffer
integer :: i, j, tri_idx

dist_min = 1.e12_wp
dist_min_normal = 1.e12_wp
distance = 0._wp

tri_idx = 0
do i = 1, ntrs
do j = 1, 3
tri(j, 1) = trs_v(j, 1, i, pid)
tri(j, 2) = trs_v(j, 2, i, pid)
tri(j, 3) = trs_v(j, 3, i, pid)
dist_buffer(j) = sqrt((point(1) - tri(j, 1))**2 + &
(point(2) - tri(j, 2))**2 + &
(point(3) - tri(j, 3))**2)
end do

do j = 1, 3
midp(j) = (tri(1, j) + tri(2, j) + tri(3, j))/3
end do

dist_t_min = minval(dist_buffer(1:3))
dist_buffer_normal = sqrt((point(1) - midp(1))**2 + &
(point(2) - midp(2))**2 + &
(point(3) - midp(3))**2)

if (dist_t_min < dist_min) then
dist_min = dist_t_min
end if

if (dist_buffer_normal < dist_min_normal) then
dist_min_normal = dist_buffer_normal
tri_idx = i
end if
end do

normals(1) = trs_n(1, tri_idx, pid)
normals(2) = trs_n(2, tri_idx, pid)
normals(3) = trs_n(3, tri_idx, pid)
distance = dist_min

end subroutine f_distance_normals_3D_flat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bf_distance_normals_3D_flat\s*\(' src

Repository: MFlowCode/MFC

Length of output: 345


🏁 Script executed:

sed -n '680,700p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 913


🏁 Script executed:

rg -nP 'gpu_ntrs' src/simulation/m_compute_levelset.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 186


🏁 Script executed:

rg -nP '\bgpu_ntrs\b' src --type-add 'fpp:*.fpp' | grep -E '(allocate|dimension|intent|gpu_ntrs\s*=)'

Repository: MFlowCode/MFC

Length of output: 206


🏁 Script executed:

sed -n '1110,1160p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 2729


🏁 Script executed:

sed -n '650,710p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 2630


Add guard in f_distance_normals_3D_flat for ntrs <= 0 to prevent out-of-bounds access.

When ntrs is 0 (which can occur if a patch has no model allocated), tri_idx remains uninitialized at 0 and is used to index trs_n(:, tri_idx, pid) at the end, causing an out-of-bounds read. The caller at src/simulation/m_compute_levelset.fpp:688 does not validate gpu_ntrs(patch_id) > 0 before invoking this function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 1209 - 1267, The subroutine
f_distance_normals_3D_flat can index trs_n(:, tri_idx, pid) with tri_idx==0 when
ntrs <= 0; add a guard at the top of f_distance_normals_3D_flat that checks if
ntrs <= 0 and in that case set distance to a safe value (e.g. a large positive
sentinel or 0._wp as appropriate) and set normals to a defined default (e.g.
zero vector) and return immediately to avoid using tri_idx/uninitialized
indices; ensure you reference and update tri_idx, distance and normals in this
early-return path so no out-of-bounds access to trs_n occurs.

Comment on lines +1402 to +1415
subroutine s_pack_model_for_gpu(ma)
type(t_model_array), intent(inout) :: ma
integer :: i

ma%ntrs = ma%model%ntrs
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate (ma%trs_n(1:3, 1:ma%ntrs))

do i = 1, ma%ntrs
ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
ma%trs_n(:, i) = ma%model%trs(i)%n(:)
end do
end subroutine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all call sites to s_pack_model_for_gpu
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bs_pack_model_for_gpu\s*\(' src || true

Repository: MFlowCode/MFC

Length of output: 220


🏁 Script executed:

#!/bin/bash
# Also search for the subroutine definition and nearby code
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' -B 5 -A 20 'subroutine s_pack_model_for_gpu' src || true

Repository: MFlowCode/MFC

Length of output: 1111


🏁 Script executed:

#!/bin/bash
# Check if ma%trs_v or ma%trs_n are referenced elsewhere to understand lifecycle
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'ma%trs_v|ma%trs_n' src || true

Repository: MFlowCode/MFC

Length of output: 367


🏁 Script executed:

#!/bin/bash
# Get context around the call site in m_ib_patches.fpp
sed -n '1080,1120p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1850


🏁 Script executed:

#!/bin/bash
# Search for surrounding loop context (broader range)
sed -n '1050,1130p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4004


🏁 Script executed:

#!/bin/bash
# Check if models is deallocated/re-initialized somewhere
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'models\s*\(|deallocate.*models' src/simulation/ | head -30

Repository: MFlowCode/MFC

Length of output: 3130


🏁 Script executed:

#!/bin/bash
# Check if s_instantiate_STL_models or models are re-initialized in any time-stepping/simulation loop
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 's_instantiate_STL_models|allocate.*models\(' src/simulation/ | grep -v '\.fpp-' | head -20

Repository: MFlowCode/MFC

Length of output: 1135


🏁 Script executed:

#!/bin/bash
# Search for any re-initialization or deallocate of models during simulation
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'deallocate.*models|models.*=|trs_v.*deallocate|trs_n.*deallocate' src | head -20

Repository: MFlowCode/MFC

Length of output: 1318


Make s_pack_model_for_gpu re-entrant by checking allocations before allocating.

The function is called only once per object during initialization (in s_instantiate_STL_models), so there is currently no runtime risk. However, unconditional allocations of ma%trs_v and ma%trs_n violate defensive programming practices and would fail silently if the function were ever called again on the same object. Add pre-allocation checks:

Proposed fix
    ma%ntrs = ma%model%ntrs
+   if (allocated(ma%trs_v)) deallocate(ma%trs_v)
+   if (allocated(ma%trs_n)) deallocate(ma%trs_n)
    allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
    allocate (ma%trs_n(1:3, 1:ma%ntrs))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
subroutine s_pack_model_for_gpu(ma)
type(t_model_array), intent(inout) :: ma
integer :: i
ma%ntrs = ma%model%ntrs
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate (ma%trs_n(1:3, 1:ma%ntrs))
do i = 1, ma%ntrs
ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
ma%trs_n(:, i) = ma%model%trs(i)%n(:)
end do
end subroutine
subroutine s_pack_model_for_gpu(ma)
type(t_model_array), intent(inout) :: ma
integer :: i
ma%ntrs = ma%model%ntrs
if (allocated(ma%trs_v)) deallocate(ma%trs_v)
if (allocated(ma%trs_n)) deallocate(ma%trs_n)
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate (ma%trs_n(1:3, 1:ma%ntrs))
do i = 1, ma%ntrs
ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
ma%trs_n(:, i) = ma%model%trs(i)%n(:)
end do
end subroutine
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 1402 - 1415, s_pack_model_for_gpu
currently unconditionally allocates ma%trs_v and ma%trs_n; make it re-entrant by
checking existing allocations before allocating: set ma%ntrs = ma%model%ntrs,
then for ma%trs_v and ma%trs_n use allocated(...) to detect prior allocation and
only allocate if not allocated or if the current allocation dimensions don't
match ma%ntrs (in that case deallocate then allocate with the correct bounds);
ensure you copy into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation
checks complete.

Comment on lines 828 to 953
!> The STL patch is a 2D geometry that is imported from an STL file.
!! @param patch_id is the patch identifier
!! @param ib_markers_sf Array to track patch ids
!! @param STL_levelset STL levelset
!! @param STL_levelset_norm STL levelset normals
subroutine s_ib_model(patch_id, ib_markers_sf)
!! @param ib_markers Array to track patch ids
subroutine s_ib_model(patch_id, ib_markers)

integer, intent(in) :: patch_id
integer, dimension(0:m, 0:n, 0:p), intent(inout) :: ib_markers_sf

integer :: i, j, k !< Generic loop iterators
type(integer_field), intent(inout) :: ib_markers

type(t_model), pointer :: model
integer :: i, j, k !< Generic loop iterators
integer :: spc

real(wp) :: eta
real(wp) :: eta, threshold
real(wp), dimension(1:3) :: point, local_point, offset
real(wp), dimension(1:3) :: center, xyz_local
real(wp), dimension(1:3) :: center, xy_local
real(wp), dimension(1:3, 1:3) :: inverse_rotation

model => models(patch_id)%model
center = 0._wp
center(1) = patch_ib(patch_id)%x_centroid
center(2) = patch_ib(patch_id)%y_centroid
if (p > 0) center(3) = patch_ib(patch_id)%z_centroid
inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)
offset(:) = patch_ib(patch_id)%centroid_offset(:)
spc = patch_ib(patch_id)%model_spc
threshold = patch_ib(patch_id)%model_threshold

do i = 0, m
do j = 0, n
do k = 0, p
$:GPU_PARALLEL_LOOP(private='[i,j, xy_local, eta]',&
& copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=2)
do i = -gp_layers, m + gp_layers
do j = -gp_layers, n + gp_layers

xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp]
if (p > 0) then
xyz_local(3) = z_cc(k) - center(3)
end if
xy_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp]
xy_local = matmul(inverse_rotation, xy_local)
xy_local = xy_local - offset

if (grid_geometry == 3) then
xy_local = f_convert_cyl_to_cart(xy_local)
end if

eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
gpu_trs_v, gpu_trs_n, &
patch_id, &
xy_local, (/dx(i), dy(j), 0._wp/), &
spc)

! Reading STL boundary vertices and compute the levelset and levelset_norm
if (eta > threshold) then
ib_markers%sf(i, j, 0) = patch_id
end if

end do
end do
$:END_GPU_PARALLEL_LOOP()

end subroutine s_ib_model

!> The STL patch is a 3D geometry that is imported from an STL file.
!! @param patch_id is the patch identifier
!! @param ib_markers Array to track patch ids
subroutine s_ib_3d_model(patch_id, ib_markers)

integer, intent(in) :: patch_id
type(integer_field), intent(inout) :: ib_markers

integer :: i, j, k, il, ir, jl, jr, kl, kr !< Generic loop iterators
integer :: spc

real(wp) :: eta, threshold, corner_distance
real(wp), dimension(1:3) :: point, local_point, offset
real(wp), dimension(1:3) :: center, xyz_local
real(wp), dimension(1:3, 1:3) :: inverse_rotation

center = 0._wp
center(1) = patch_ib(patch_id)%x_centroid
center(2) = patch_ib(patch_id)%y_centroid
center(3) = patch_ib(patch_id)%z_centroid
inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)
offset(:) = patch_ib(patch_id)%centroid_offset(:)
spc = patch_ib(patch_id)%model_spc
threshold = patch_ib(patch_id)%model_threshold

il = -gp_layers
jl = -gp_layers
kl = -gp_layers
ir = m + gp_layers
jr = n + gp_layers
kr = p + gp_layers
corner_distance = 0._wp
do i = 1, 3
corner_distance = corner_distance + maxval(abs(stl_bounding_boxes(patch_id, i, 1:3)))**2 ! distance to rim of cylinder
end do
corner_distance = sqrt(corner_distance)
call get_bounding_indices(center(1) - corner_distance, center(1) + corner_distance, x_cc, il, ir)
call get_bounding_indices(center(2) - corner_distance, center(2) + corner_distance, y_cc, jl, jr)
call get_bounding_indices(center(3) - corner_distance, center(3) + corner_distance, z_cc, kl, kr)

$:GPU_PARALLEL_LOOP(private='[i,j,k, xyz_local, eta]',&
& copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=3)
do i = il, ir
do j = jl, jr
do k = kl, kr

xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), z_cc(k) - center(3)]
xyz_local = matmul(inverse_rotation, xyz_local)
xyz_local = xyz_local - offset

if (grid_geometry == 3) then
xyz_local = f_convert_cyl_to_cart(xyz_local)
end if

if (p == 0) then
eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), 0._wp/), patch_ib(patch_id)%model_spc)
else
eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), dz(k)/), patch_ib(patch_id)%model_spc)
end if
eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
gpu_trs_v, gpu_trs_n, &
patch_id, &
xyz_local, (/dx(i), dy(j), dz(k)/), &
spc)

! Reading STL boundary vertices and compute the levelset and levelset_norm
if (eta > patch_ib(patch_id)%model_threshold) then
ib_markers_sf(i, j, k) = patch_id
ib_markers%sf(i, j, k) = patch_id
end if

end do
end do
end do
$:END_GPU_PARALLEL_LOOP()

end subroutine s_ib_model
end subroutine s_ib_3d_model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 1. Find s_instantiate_STL_models and s_apply_ib_patches calls to check ordering
echo "=== Looking for s_instantiate_STL_models and s_apply_ib_patches calls ==="
rg -n 's_instantiate_STL_models|s_apply_ib_patches' src/simulation/ --type fpp

echo -e "\n=== Checking if they're in correct order within m_ibm.fpp ==="
rg -n 's_instantiate_STL_models\|s_apply_ib_patches' src/simulation/m_ibm.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 238


🏁 Script executed:

#!/bin/bash
# 2. Find where gpu_ntrs/gpu_trs_v/gpu_trs_n are allocated and managed
echo "=== Finding gpu_ntrs/gpu_trs_v/gpu_trs_n allocations and GPU_ENTER_DATA ==="
rg -n 'allocate.*gpu_ntrs|allocate.*gpu_trs_v|allocate.*gpu_trs_n' src/simulation/ --type fpp

echo -e "\n=== Searching for GPU_ENTER_DATA calls with these variables ==="
rg -n 'GPU_ENTER_DATA' src/simulation/m_ib_patches.fpp -A 5 || rg -n 'GPU_ENTER_DATA' src/simulation/m_ibm.fpp -A 5

Repository: MFlowCode/MFC

Length of output: 821


🏁 Script executed:

#!/bin/bash
# 3. Verify bounding-box logic in both functions
echo "=== Checking s_ib_model for bounding-box logic ==="
sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|bounding' || echo "No bounding-box narrowing found in s_ib_model"

echo -e "\n=== Checking s_ib_3d_model for bounding-box logic ==="
sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|kl\|kr\|bounding'

Repository: MFlowCode/MFC

Length of output: 1111


🏁 Script executed:

#!/bin/bash
# 4. Check data movement for gpu_ntrs/gpu_trs_v/gpu_trs_n in the GPU loop context
echo "=== Checking GPU_PARALLEL_LOOP copyin clause for these variables ==="
sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP'

echo -e "\n=== In s_ib_3d_model ==="
sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP'

echo -e "\n=== Checking if gpu_trs_v/gpu_trs_n are explicitly referenced ==="
rg -n 'gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 1440


🏁 Script executed:

#!/bin/bash
# Check the broader context of s_apply_ib_patches and see if GPU_ENTER_DATA creates persistent region
rg -n 's_apply_ib_patches' src/simulation/m_ibm.fpp -B 10 -A 20 | head -80

Repository: MFlowCode/MFC

Length of output: 2981


🏁 Script executed:

#!/bin/bash
# Look for GPU_DATA wrapping context around the kernel calls
rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 493


🏁 Script executed:

#!/bin/bash
# Check f_model_is_inside_flat signature to see if it expects device arrays
rg -n 'subroutine\s+f_model_is_inside_flat|function\s+f_model_is_inside_flat' src/simulation/ -A 15

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Verify where GPU_ENTER_DATA is called and what scope it governs
rg -n 'subroutine s_instantiate_STL_models' src/simulation/m_ib_patches.fpp -A 300 | tail -150

Repository: MFlowCode/MFC

Length of output: 6345


🏁 Script executed:

#!/bin/bash
# Check if there's a persistent GPU_ENTER_DATA region wrapping both calls
# Look for GPU_ENTER_DATA without matching GPU_EXIT_DATA
rg -n 'GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 185


🏁 Script executed:

#!/bin/bash
# Verify the exact scope of GPU_ENTER_DATA at line 1160 - is it persistent?
sed -n '1140,1170p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1881


🏁 Script executed:

#!/bin/bash
# Check where gpu_ntrs/gpu_trs_v/gpu_trs_n might be declared with present() or accessed post-ENTER_DATA
rg -n 'present\|gpu_ntrs\|gpu_trs_v\|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -A 3 -B 3 'present'

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for how GPU data persistence is managed - look for surrounding GPU_DATA or block context
sed -n '1100,1170p' src/simulation/m_ib_patches.fpp | head -80

Repository: MFlowCode/MFC

Length of output: 3699


🏁 Script executed:

#!/bin/bash
# Check if there's a GPU_EXIT_DATA anywhere in the file
rg -n 'GPU_EXIT_DATA' src/simulation/m_ib_patches.fpp || echo "No GPU_EXIT_DATA found"

# Check broader context around GPU_ENTER_DATA call at line 1160
sed -n '1090,1170p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4049


🏁 Script executed:

#!/bin/bash
# Look for how f_model_is_inside_flat uses the arrays - check if it's marked GPU_ROUTINE
rg -n 'f_model_is_inside_flat' src/simulation/ -A 5 | head -40

Repository: MFlowCode/MFC

Length of output: 1164


🏁 Script executed:

#!/bin/bash
# Check the GPU macro definitions to understand data region semantics
fd 'gpuParallelization\|gpu.*\.fpp' src/simulation/ && cat src/simulation/gpuParallelization.fpp 2>/dev/null | head -200 || echo "File not found, searching for macro definitions"
rg -n '@:GPU_PARALLEL_LOOP|@:GPU_DATA|@:GPU_ENTER_DATA' src/ -B 2 | head -50

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check if there's a broad GPU_DATA or GPU context managing device persistence around the entire flow
rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA|GPU_ROUTINE' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 280


🏁 Script executed:

#!/bin/bash
# Look at the main timestepping loop structure to understand device data persistence scope
rg -n 'subroutine s_time_step\|end subroutine s_time_step' src/simulation/m_ibm.fpp -A 300 | head -150

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check if the GPU arrays like gpu_trs_v are declared as module-level allocatables that persist
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -i 'allocatable\|dimension' | head -10

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for module-level declarations of gpu arrays
rg -n 'allocatable.*gpu_ntrs|allocatable.*gpu_trs_v|allocatable.*gpu_trs_n' src/simulation/m_ib_patches.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Find where these gpu arrays are declared at module scope
rg -n 'integer.*allocatable|real.*allocatable' src/simulation/m_ib_patches.fpp | grep -i gpu

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Get the entire module-level variable declarations section
sed -n '1,100p' src/simulation/m_ib_patches.fpp | grep -n 'allocatable\|dimension'

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for the module declaration section more broadly
head -200 src/simulation/m_ib_patches.fpp | tail -100

Repository: MFlowCode/MFC

Length of output: 3972


🏁 Script executed:

#!/bin/bash
# Check module-level variable declarations in m_ib_patches
grep -n 'module m_ib_patches' src/simulation/m_ib_patches.fpp -A 50 | head -60

Repository: MFlowCode/MFC

Length of output: 2313


🏁 Script executed:

#!/bin/bash
# Verify whether gpu_trs_v/gpu_trs_n require present() or explicit data movement in nested GPU regions
# Look for documentation or examples in gpuParallelization guidance
rg -n 'present\|GPU_PARALLEL_LOOP' src/simulation/m_ibm.fpp -B 2 -A 3 | head -80

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check if any other GPU_PARALLEL_LOOP kernels use variables allocated via GPU_ENTER_DATA elsewhere
rg -n 'GPU_PARALLEL_LOOP' src/simulation/ -A 5 | grep -A 5 'copyin\|present' | head -60

Repository: MFlowCode/MFC

Length of output: 4582


🏁 Script executed:

#!/bin/bash
# Critical check: verify gpu_ntrs/gpu_trs_v/gpu_trs_n are referenced but NOT in data clauses
echo "=== In s_ib_model GPU_PARALLEL_LOOP (line ~850) ==="
sed -n '850,870p' src/simulation/m_ib_patches.fpp

echo -e "\n=== In s_ib_3d_model GPU_PARALLEL_LOOP (line ~920) ==="
sed -n '920,940p' src/simulation/m_ib_patches.fpp

echo -e "\n=== Checking for 'present' in GPU_PARALLEL_LOOP patterns ==="
rg -n 'present=' src/simulation/ | head -10

Repository: MFlowCode/MFC

Length of output: 2233


🏁 Script executed:

#!/bin/bash
# Check if gpu_ntrs/gpu_trs_v/gpu_trs_n are module-level in m_model or elsewhere
fd -e fpp -e f90 | xargs grep -l 'gpu_ntrs\|gpu_trs_v' | head -5
rg -n 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/ -B 2 | head -30

Repository: MFlowCode/MFC

Length of output: 549


🏁 Script executed:

#!/bin/bash
# Check if s_instantiate_STL_models is called before s_apply_ib_patches in BOTH initial and restart paths
echo "=== Initial setup (lines ~100-120) ==="
sed -n '100,120p' src/simulation/m_ibm.fpp

echo -e "\n=== Restart/update path (lines ~935-950) ==="
sed -n '935,950p' src/simulation/m_ibm.fpp

Repository: MFlowCode/MFC

Length of output: 1505


🏁 Script executed:

#!/bin/bash
# Verify that GPU arrays are module-level and persisted
echo "=== Module scope declaration in m_model.fpp ==="
sed -n '30,40p' src/common/m_model.fpp

echo -e "\n=== Confirm they're never deallocated ==="
rg -n 'deallocate.*gpu_ntrs|deallocate.*gpu_trs_v|deallocate.*gpu_trs_n' src/

Repository: MFlowCode/MFC

Length of output: 810


Critical: GPU model data not refreshed in s_update_mib restart path; also declare persistent GPU arrays explicitly in kernel data clauses.

The gpu_ntrs/gpu_trs_v/gpu_trs_n arrays (module-level in m_model.fpp) are allocated and pushed to device via GPU_ENTER_DATA in s_instantiate_STL_models() with persistent lifetime. However, in the restart path (s_update_mib), s_apply_ib_patches() is called at line 945 without re-calling s_instantiate_STL_models() to refresh GPU data. If s_update_ib_rotation_matrix() modifies model state, the device arrays will be stale on the next marker generation.

Additionally, both s_ib_model and s_ib_3d_model reference gpu_ntrs/gpu_trs_v/gpu_trs_n inside GPU_PARALLEL_LOOP but do not explicitly declare them in the copyin or present clauses—they rely on implicit device persistence. For clarity and robustness, add present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' to the loop declarations.

Finally, s_ib_model (2D) scans the full grid; adding the same bounding-box narrowing logic from s_ib_3d_model would improve performance for sparse STL objects.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 828 - 953, s_update_mib’s
restart path must refresh the device-resident model arrays before generating
markers: ensure s_instantiate_STL_models() (or an equivalent
GPU_ENTER_DATA/refresh routine for module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n)
is called or re-entered before s_apply_ib_patches() in the restart branch so
device data reflects any changes from s_update_ib_rotation_matrix(); in
s_ib_model and s_ib_3d_model, add explicit device presence to the
GPU_PARALLEL_LOOP clauses by including present='[gpu_ntrs, gpu_trs_v,
gpu_trs_n]' (in addition to existing copyin) to avoid relying on implicit
persistence; and narrow the 2D scan in s_ib_model by reusing the bounding-index
logic from s_ib_3d_model (compute corner_distance from stl_bounding_boxes and
call get_bounding_indices to set i/j limits) to avoid scanning the full grid for
sparse STL patches.

Comment on lines +954 to +1169
subroutine s_instantiate_STL_models()

! Variables for IBM+STL
real(wp) :: normals(1:3) !< Boundary normal buffer
integer :: boundary_vertex_count, boundary_edge_count, total_vertices !< Boundary vertex
real(wp), allocatable, dimension(:, :, :) :: boundary_v !< Boundary vertex buffer
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v !< Interpolated vertex buffer
real(wp) :: dx_local, dy_local, dz_local !< Levelset distance buffer
logical :: interpolate !< Logical variable to determine whether or not the model should be interpolated

integer :: i, j, k !< Generic loop iterators
integer :: patch_id

type(t_bbox) :: bbox, bbox_old
type(t_model) :: model
type(ic_model_parameters) :: params

real(wp) :: eta
real(wp), dimension(1:3) :: point, model_center
real(wp) :: grid_mm(1:3, 1:2)

real(wp), dimension(1:4, 1:4) :: transform, transform_n

dx_local = minval(dx); dy_local = minval(dy)
if (p /= 0) dz_local = minval(dz)

do patch_id = 1, num_ibs
if (patch_ib(patch_id)%geometry == 5 .or. patch_ib(patch_id)%geometry == 12) then
allocate (models(patch_id)%model)
print *, " * Reading model: "//trim(patch_ib(patch_id)%model_filepath)

model = f_model_read(patch_ib(patch_id)%model_filepath)
params%scale(:) = patch_ib(patch_id)%model_scale(:)
params%translate(:) = patch_ib(patch_id)%model_translate(:)
params%rotate(:) = patch_ib(patch_id)%model_rotate(:)
params%spc = patch_ib(patch_id)%model_spc
params%threshold = patch_ib(patch_id)%model_threshold

if (f_approx_equal(dot_product(params%scale, params%scale), 0._wp)) then
params%scale(:) = 1._wp
end if

if (proc_rank == 0) then
print *, " * Transforming model."
end if

! Get the model center before transforming the model
bbox_old = f_create_bbox(model)
model_center(1:3) = (bbox_old%min(1:3) + bbox_old%max(1:3))/2._wp

! Compute the transform matrices for vertices and normals
transform = f_create_transform_matrix(params, model_center)
transform_n = f_create_transform_matrix(params)

call s_transform_model(model, transform, transform_n)

! Recreate the bounding box after transformation
bbox = f_create_bbox(model)

! Show the number of vertices in the original STL model
if (proc_rank == 0) then
print *, ' * Number of input model vertices:', 3*model%ntrs
end if

call f_check_boundary(model, boundary_v, boundary_vertex_count, boundary_edge_count)

! Check if the model needs interpolation
if (p > 0) then
call f_check_interpolation_3D(model, (/dx_local, dy_local, dz_local/), interpolate)
else
call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
end if
interpolate = .false.

! Show the number of edges and boundary edges in 2D STL models
if (proc_rank == 0 .and. p == 0) then
print *, ' * Number of 2D model boundary edges:', boundary_edge_count
end if

! Interpolate the STL model along the edges (2D) and on triangle facets (3D)
if (interpolate) then
if (proc_rank == 0) then
print *, ' * Interpolating STL vertices.'
end if

if (p > 0) then
call f_interpolate_3D(model, (/dx, dy, dz/), interpolated_boundary_v, total_vertices)
else
call f_interpolate_2D(boundary_v, boundary_edge_count, (/dx, dy, dz/), interpolated_boundary_v, total_vertices)
end if

if (proc_rank == 0) then
print *, ' * Total number of interpolated boundary vertices:', total_vertices
end if
end if

if (proc_rank == 0) then
write (*, "(A, 3(2X, F20.10))") " > Model: Min:", bbox%min(1:3)
write (*, "(A, 3(2X, F20.10))") " > Cen:", (bbox%min(1:3) + bbox%max(1:3))/2._wp
write (*, "(A, 3(2X, F20.10))") " > Max:", bbox%max(1:3)

grid_mm(1, :) = (/minval(x_cc(0:m)) - 0.5_wp*dx_local, maxval(x_cc(0:m)) + 0.5_wp*dx_local/)
grid_mm(2, :) = (/minval(y_cc(0:n)) - 0.5_wp*dy_local, maxval(y_cc(0:n)) + 0.5_wp*dy_local/)

if (p > 0) then
grid_mm(3, :) = (/minval(z_cc(0:p)) - 0.5_wp*dz_local, maxval(z_cc(0:p)) + 0.5_wp*dz_local/)
else
grid_mm(3, :) = (/0._wp, 0._wp/)
end if

write (*, "(A, 3(2X, F20.10))") " > Domain: Min:", grid_mm(:, 1)
write (*, "(A, 3(2X, F20.10))") " > Cen:", (grid_mm(:, 1) + grid_mm(:, 2))/2._wp
write (*, "(A, 3(2X, F20.10))") " > Max:", grid_mm(:, 2)
end if

allocate (stl_bounding_boxes(patch_id, 1:3, 1:3))
stl_bounding_boxes(patch_id, 1, 1:3) = [bbox%min(1), (bbox%min(1) + bbox%max(1))/2._wp, bbox%max(1)]
stl_bounding_boxes(patch_id, 2, 1:3) = [bbox%min(2), (bbox%min(2) + bbox%max(2))/2._wp, bbox%max(2)]
stl_bounding_boxes(patch_id, 3, 1:3) = [bbox%min(3), (bbox%min(3) + bbox%max(3))/2._wp, bbox%max(3)]

models(patch_id)%model = model
models(patch_id)%boundary_v = boundary_v
models(patch_id)%boundary_edge_count = boundary_edge_count
if (interpolate) then
models(patch_id)%interpolate = 1
else
models(patch_id)%interpolate = 0
end if
if (interpolate) then
models(patch_id)%interpolated_boundary_v = interpolated_boundary_v
models(patch_id)%total_vertices = total_vertices
end if

end if
end do

! Pack and upload flat arrays for GPU (AFTER the loop)
block
integer :: pid, max_ntrs
integer :: max_bv1, max_bv2, max_bv3, max_iv1, max_iv2

max_ntrs = 0
max_bv1 = 0; max_bv2 = 0; max_bv3 = 0
max_iv1 = 0; max_iv2 = 0

do pid = 1, num_ibs
if (allocated(models(pid)%model)) then
call s_pack_model_for_gpu(models(pid))
max_ntrs = max(max_ntrs, models(pid)%ntrs)
end if
if (allocated(models(pid)%boundary_v)) then
max_bv1 = max(max_bv1, size(models(pid)%boundary_v, 1))
max_bv2 = max(max_bv2, size(models(pid)%boundary_v, 2))
max_bv3 = max(max_bv3, size(models(pid)%boundary_v, 3))
end if
if (allocated(models(pid)%interpolated_boundary_v)) then
max_iv1 = max(max_iv1, size(models(pid)%interpolated_boundary_v, 1))
max_iv2 = max(max_iv2, size(models(pid)%interpolated_boundary_v, 2))
end if
end do

if (max_ntrs > 0) then
allocate (gpu_ntrs(1:num_ibs))
allocate (gpu_trs_v(1:3, 1:3, 1:max_ntrs, 1:num_ibs))
allocate (gpu_trs_n(1:3, 1:max_ntrs, 1:num_ibs))
allocate (gpu_interpolate(1:num_ibs))
allocate (gpu_boundary_edge_count(1:num_ibs))
allocate (gpu_total_vertices(1:num_ibs))

gpu_ntrs = 0
gpu_trs_v = 0._wp
gpu_trs_n = 0._wp
gpu_interpolate = 0
gpu_boundary_edge_count = 0
gpu_total_vertices = 0

if (max_bv1 > 0) then
allocate (gpu_boundary_v(1:max_bv1, 1:max_bv2, 1:max_bv3, 1:num_ibs))
gpu_boundary_v = 0._wp
end if

if (max_iv1 > 0) then
allocate (gpu_interpolated_boundary_v(1:max_iv1, 1:max_iv2, 1:num_ibs))
gpu_interpolated_boundary_v = 0._wp
end if

do pid = 1, num_ibs
if (allocated(models(pid)%model)) then
gpu_ntrs(pid) = models(pid)%ntrs
gpu_trs_v(:, :, 1:models(pid)%ntrs, pid) = models(pid)%trs_v
gpu_trs_n(:, 1:models(pid)%ntrs, pid) = models(pid)%trs_n
gpu_interpolate(pid) = models(pid)%interpolate
gpu_boundary_edge_count(pid) = models(pid)%boundary_edge_count
gpu_total_vertices(pid) = models(pid)%total_vertices
end if
if (allocated(models(pid)%boundary_v)) then
gpu_boundary_v(1:size(models(pid)%boundary_v, 1), &
1:size(models(pid)%boundary_v, 2), &
1:size(models(pid)%boundary_v, 3), pid) = models(pid)%boundary_v
end if
if (allocated(models(pid)%interpolated_boundary_v)) then
gpu_interpolated_boundary_v(1:size(models(pid)%interpolated_boundary_v, 1), &
1:size(models(pid)%interpolated_boundary_v, 2), pid) = models(pid)%interpolated_boundary_v
end if
end do

$:GPU_ENTER_DATA(copyin='[gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices]')
if (allocated(gpu_boundary_v)) then
$:GPU_ENTER_DATA(copyin='[gpu_boundary_v]')
end if
if (allocated(gpu_interpolated_boundary_v)) then
$:GPU_ENTER_DATA(copyin='[gpu_interpolated_boundary_v]')
end if
end if
end block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for all occurrences of stl_bounding_boxes in the codebase
echo "=== stl_bounding_boxes allocations and usage ==="
rg -n "stl_bounding_boxes" src/simulation/m_ib_patches.fpp src/common/m_model.fpp 2>/dev/null || echo "Not found in specified files"

# Broader search
rg -n "stl_bounding_boxes" --type f90 --type fpp 2>/dev/null || true

Repository: MFlowCode/MFC

Length of output: 999


🏁 Script executed:

#!/bin/bash
# Check total_vertices initialization and usage pattern
echo "=== total_vertices usage pattern ==="
rg -n "total_vertices" src/simulation/m_ib_patches.fpp | head -50

# Check gpu_total_vertices
echo "=== gpu_total_vertices pattern ==="
rg -n "gpu_total_vertices" src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1328


🏁 Script executed:

#!/bin/bash
# Check gpu_* allocations and deallocations
echo "=== gpu_* array allocations ==="
rg -n "allocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30

echo "=== gpu_* deallocations ==="
rg -n "deallocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30

# Check if there's a finalize/cleanup subroutine
echo "=== Potential cleanup routines ==="
rg -n "subroutine.*finalize|subroutine.*cleanup" src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 855


🏁 Script executed:

#!/bin/bash
# Check the module-level declarations to understand data structure
echo "=== Module-level declarations for gpu_* and stl_bounding_boxes ==="
rg -n "real.*gpu_|integer.*gpu_|allocatable.*gpu_|real.*stl_bounding|allocatable.*stl_bounding" src/simulation/m_ib_patches.fpp | head -40

# Look for module/global scope context
head -150 src/simulation/m_ib_patches.fpp | tail -100

Repository: MFlowCode/MFC

Length of output: 4044


s_instantiate_STL_models has critical allocation/initialization issues breaking multi-model scenarios.

All four issues are confirmed:

  1. stl_bounding_boxes allocation inside loop: Line 1069 allocates with allocate(stl_bounding_boxes(patch_id, 1:3, 1:3)) within the do patch_id = 1, num_ibs loop. This will fail on the second iteration—the first allocates shape (1,3,3), the second attempts (2,3,3), but the array is already allocated. Should allocate once before the loop with shape (num_ibs, 1:3, 1:3).

  2. total_vertices uninitialized when interpolate=false: The local variable total_vertices (line 958) is only assigned at lines 1040 or 1042 inside the if (interpolate) block. Line 1084 unconditionally copies it into models(patch_id)%total_vertices, propagating garbage values when interpolation is skipped. Initialize or conditionally assign.

  3. Incorrect spacing vector in interpolation: Lines 1040 and 1042 pass (/dx, dy, dz/) (array elements, not spacing magnitudes), though this is currently masked by line 1037 forcing interpolate = .false.. If interpolation is re-enabled, use (/dx_local, dy_local, dz_local/).

  4. Missing allocation guards on gpu_ arrays*: Lines 1116–1121 allocate module-level gpu_* arrays without checking if already allocated. Calling s_instantiate_STL_models twice will fail. Add guards: if (.not. allocated(gpu_ntrs)) allocate(...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 954 - 1169,
s_instantiate_STL_models has four fixes: allocate stl_bounding_boxes once before
the patch loop with shape (num_ibs,1:3,1:3) instead of calling
allocate(stl_bounding_boxes(patch_id,...)) inside the loop; ensure
total_vertices is initialized or only copied into
models(patch_id)%total_vertices when interpolate is true (do not propagate
uninitialized total_vertices when interpolation is skipped); pass spacing
magnitudes to the interpolation routines by using (/dx_local, dy_local,
dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the
unconditional interpolate = .false. that currently masks interpolation logic);
and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v,
gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each
allocate to allow repeated calls to s_instantiate_STL_models.

Comment on lines +687 to 688
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
"num_ibs must be between 1 and num_patches_max (10)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix outdated error message for num_ibs upper bound.

The validation allows up to 1000, but the message still says 10, which will mislead users. Consider deriving the bound from CONSTRAINTS to avoid future drift.

✅ Suggested fix
-        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
-                     "num_ibs must be between 1 and num_patches_max (10)")
+        max_ibs = CONSTRAINTS["num_ibs"]["max"]
+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > max_ibs),
+                     f"num_ibs must be between 1 and num_patches_max ({max_ibs})")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@toolchain/mfc/case_validator.py` around lines 687 - 688, The error message
passed to self.prohibit for the num_ibs check is stale (mentions 10) while the
actual upper bound is 1000; update the message to reflect the real max and, to
prevent future drift, construct the message using the authoritative constraint
value (e.g. the CONSTRAINTS constant or the class attribute used for bounds)
rather than a hardcoded literal—locate the self.prohibit call that checks
num_ibs (the line with self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
...)) and interpolate/format the upper-bound from CONSTRAINTS (or
self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as appropriate) so the message
reads "num_ibs must be between 1 and <max>".

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 20, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added the size:XL This PR changes 500-999 lines, ignoring generated files label Feb 20, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 20, 2026

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • GPU Kernel Data Missing
    Several newly-added GPU parallel loops (notably the STL model kernels such as s_ib_model and s_ib_3d_model) reference arrays and device buffers (e.g. x_cc, y_cc, dx, dy, dz, gpu_ntrs, gpu_trs_v, gpu_trs_n) that are not listed in the loop's copyin/copy lists. If those symbols are not available on-device by other means, kernels may read stale or host-only data, producing incorrect IB markings on the device. Verify all GPU_PARALLEL_LOOP directives include every array and device buffer the kernel reads.

  • Atomic counter / reduction
    Several GPU-parallel loops use local counters with copyin and then update them with GPU atomics (e.g. count, count_i, num_gps_local, num_inner_gps_local). The combination of copyin (per-thread copy) and atomic capture/update as used may not provide the intended global accumulation on device memory and can cause lost updates or incorrect final counts after the kernel. Verify that the counters are allocated in device-shared memory (or passed by device pointer) and that the atomic operations target those device-resident variables. Also ensure correct initialization on the device before the parallel region.

  • Memory / GPU upload sizing
    The STL packing and GPU upload path computes maxima (max_ntrs, max_bv*, max_iv*) and then allocates device-side arrays sized by num_ibs and these maxima. Check worst-case sizes (num_ibs up to 1000 per PR) to ensure memory fits on device. Also verify repeated allocate()/copyin sequences won't be invoked frequently at runtime (should be done once at initialization), and that the pack/enter-data sequence matches the lifetime of host-side arrays.

  • Large compile-time bound
    The constant num_patches_max was increased from 10 to 1000. Verify all arrays, derived sizes, and routines that use this parameter can handle the larger bound without excessive memory usage, stack/static allocation problems, or performance regressions. Check MPI I/O, sizing of scratch buffers, and GPU memory footprints that depend on this constant.

  • Neighbor indexing / boundary safety
    The marker-neighborhood search loops directly index neighbors using ranges like i - gp_layers : i + gp_layers (and similarly for j,k) without explicit bounds checks in the new code paths. If gp_layers is larger than the buffer or if values are close to domain boundaries this may read outside intended domain or into invalid memory on the device. Confirm that buffer/bounds guarantees still hold on all runs or explicitly clamp indices to the valid buffered range.

  • Interpolate type mismatch
    The variable interpolate was changed from a logical to an integer and is later tested both as a boolean (if (interpolate) then) and by equality (if (interpolate == 1) then). This inconsistent usage can be a source of subtle bugs if gpu_interpolate(...) doesn't return exactly the expected integer values (0/1). Ensure the value/type of gpu_interpolate and all checks are consistent.

  • Possible Bug
    The field interpolate was changed from logical to integer. Call-sites throughout the codebase may still treat it as a logical flag; this can change semantics and introduce subtle bugs. Verify every use-site, any comparisons or logical tests, and any GPU transfer code that expects booleans.

  • GPU-device compatibility
    Several routines and data accesses used for STL models were switched to GPU-resident arrays (e.g., gpu_trs_v, gpu_trs_n, gpu_interpolated_boundary_v, gpu_boundary_v) and GPU-enabled calls (e.g., f_distance_normals_3D_flat, f_interpolated_distance, f_normals, f_distance). Verify that these functions are device-capable (compiled for GPU) and accept the new GPU-side data layout. Also ensure proper data residency/synchronization so no host-only arrays are accessed on device implicitly.

  • RNG quality & parallel safety
    The PR replaces Fortran's RNG with a custom integer xorshift in f_model_random_number for use on the GPU. This is a low-quality PRNG for Monte-Carlo style ray sampling (potential bias) and the normalization (dividing by huge(seed)) may be fragile/implementation-dependent. Also the seed is derived deterministically from point coordinates and may create correlated samples for neighboring points or patterns across the domain. Verify statistical properties, reproducibility, and that the seed state is safe for parallel GPU execution.

  • Robustness: tri_idx usage
    In f_distance_normals_3D_flat (and similar routines) tri_idx is initialized to 0 and later used to index triangle arrays. If ntrs can ever be zero (or none chosen as closest), indexing with 0 will be invalid. Add defensive checks or ensure ntrs>0 is always true before using tri_idx.

dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
end do
dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
dir(:) = dir(:)/dir_mag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Guard the normalization of the ray direction in the flat "inside" check to avoid potential division by zero in degenerate cases. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
dir(:) = dir(:)/dir_mag
if (dir_mag > 0._wp) then
dir(:) = dir(:)/dir_mag
else
dir(:) = 0._wp
end if
Why it matters? ⭐

Normalizing a zero-length direction vector can produce a division-by-zero or NaNs; guarding with a magnitude check directly fixes a correctness/numerical stability issue (priority #1 in the MFC rules). The suggested conditional is syntactically valid Fortran and directly prevents the unsafe division.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_model.fpp
**Line:** 613:613
**Comment:**
	*Custom Rule: Guard the normalization of the ray direction in the flat "inside" check to avoid potential division by zero in degenerate cases.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

integer :: i

ma%ntrs = ma%model%ntrs
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Safely deallocate any existing packed triangle buffers before reallocating them to prevent memory leaks when repacking models for the GPU. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
if (allocated(ma%trs_v)) deallocate(ma%trs_v)
if (allocated(ma%trs_n)) deallocate(ma%trs_n)
Why it matters? ⭐

The PR added allocation of packed triangle buffers but did not guard against repeated calls to s_pack_model_for_gpu leaving previously allocated arrays attached — this directly violates the MFC memory rule about matching ALLOCATE/DEALLOCATE pairing. The proposed change (deallocate if allocated before reallocate) is a minimal, correct fix that prevents leaks and addresses the custom rule about paired allocation/deallocation.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_model.fpp
**Line:** 1399:1399
**Comment:**
	*Custom Rule: Safely deallocate any existing packed triangle buffers before reallocating them to prevent memory leaks when repacking models for the GPU.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎


integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
logical :: interpolate
integer :: interpolate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Use a logical type for the interpolation flag to keep the conditional checks standard-conforming and portable across compilers. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
integer :: interpolate
logical :: interpolate
Why it matters? ⭐

The PR declares interpolate as an INTEGER but elsewhere in this same subroutine it is tested in a logical context (e.g. if (interpolate) then in the 3D branch). Using an INTEGER in a logical expression is non‑standard Fortran and will break compiler portability (violates the repository's compiler portability guidance). Changing the declaration to LOGICAL directly addresses that rule violation and makes the intent explicit and standard-conforming.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/simulation/m_compute_levelset.fpp
**Line:** 650:650
**Comment:**
	*Custom Rule: Use a logical type for the interpolation flag to keep the conditional checks standard-conforming and portable across compilers.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
end if
interpolate = .false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Preserve the result of the interpolation check instead of unconditionally disabling interpolation to avoid changing STL model resolution behavior. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
Why it matters? ⭐

The PR calls the interpolation-check routines but then unconditionally sets interpolate = .false., which negates the purpose of the checks. Per the repository's PR-pattern guidance (STL model interpolation / f_check_interpolation_* usage), the computed interpolate flag should be preserved so models that require interpolation remain interpolated. Removing the forced override directly restores correct behavior; this fixes a functional regression rather than being a cosmetic change.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/simulation/m_ib_patches.fpp
**Line:** 1032:1032
**Comment:**
	*Custom Rule: Preserve the result of the interpolation check instead of unconditionally disabling interpolation to avoid changing STL model resolution behavior.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

"Immersed Boundaries do not work in 1D (requires n > 0)")
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
"num_ibs must be between 1 and num_patches_max (10)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Update the immersed boundaries count validation message so its documented range matches the new numerical constraint. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
"num_ibs must be between 1 and num_patches_max (10)")
"num_ibs must be between 1 and 1000")
Why it matters? ⭐

The current code enforces num_ibs <= 1000 but the error message still references num_patches_max (10), which is inconsistent and confusing for users. The proposed change updates the message to reflect the actual numeric constraint (1..1000), directly fixing a correctness/clarity issue related to constraint reporting. This aligns with the repository guidance to prioritize correctness and clear parameter constraints.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** toolchain/mfc/case_validator.py
**Line:** 688:688
**Comment:**
	*Custom Rule: Update the immersed boundaries count validation message so its documented range matches the new numerical constraint.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

real(wp), dimension(1:3) :: dist_buffer
integer :: i, j, tri_idx

dist_min = 1.e12_wp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In f_distance_normals_3D_flat, tri_idx remains zero when ntrs is zero, leading to an out-of-bounds access trs_n(:, tri_idx, pid) and undefined behavior; you should explicitly handle the ntrs <= 0 case before the loop by returning a safe default normal and distance. [off-by-one]

Severity Level: Major ⚠️
- ❌ 3D flat distance/normals routine may read out-of-bounds.
- ⚠️ GPU levelset computations for STL IBs may crash.
Suggested change
dist_min = 1.e12_wp
! Handle degenerate case with no triangles to avoid invalid indexing
if (ntrs <= 0) then
normals = 0._wp
distance = 0._wp
return
end if
Steps of Reproduction ✅
1. Build the code including `module m_model` in `src/common/m_model.fpp`, ensuring
bounds-checking is enabled (e.g., `-Mbounds` with NVHPC or `-check bounds` with ifort).

2. From any caller (e.g., a level-set or IBM GPU packing routine) invoke
`f_distance_normals_3D_flat` (defined at lines 1203–1261 in `src/common/m_model.fpp`) with
`ntrs = 0`, and with `trs_v`/`trs_n` passed from a `t_model_array` whose `trs_v`/`trs_n`
were allocated via `s_pack_model_for_gpu` at lines 1395–1406 (which allocates them with
lower bound 1 in the triangle dimension).

3. When `f_distance_normals_3D_flat` executes, the `do i = 1, ntrs` loop (lines 1215–1236)
is skipped because `ntrs = 0`, leaving `tri_idx` at its initialized value of 0.

4. The routine then executes the assignments to `normals` at lines 1256–1258, accessing
`trs_n(:, tri_idx, pid)` with `tri_idx = 0`; since `trs_n` is indexed from 1, this is an
out-of-bounds access that will either trigger a runtime bounds error (with checking on) or
cause undefined behavior/incorrect normals on CPU or GPU.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_model.fpp
**Line:** 1222:1222
**Comment:**
	*Off By One: In `f_distance_normals_3D_flat`, `tri_idx` remains zero when `ntrs` is zero, leading to an out-of-bounds access `trs_n(:, tri_idx, pid)` and undefined behavior; you should explicitly handle the `ntrs <= 0` case before the loop by returning a safe default normal and distance.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

"Immersed Boundaries do not work in 1D (requires n > 0)")
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
"num_ibs must be between 1 and num_patches_max (10)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The validation condition now allows up to 1000 immersed boundaries, but the associated error message still says the maximum is 10, which will confuse users when num_ibs > 1000 is rejected with an incorrect lower limit; the message should be updated to reflect the new upper bound so that validation feedback matches actual behavior. [logic error]

Severity Level: Major ⚠️
- ⚠️ IBM case validation message misstates allowed num_ibs range.
- ⚠️ Users confused configuring more than ten immersed boundaries.
Suggested change
"num_ibs must be between 1 and num_patches_max (10)")
"num_ibs must be between 1 and 1000")
Steps of Reproduction ✅
1. Create a case file (e.g., `case.py`) that enables immersed boundaries with `ib = 'T'`
and sets `num_ibs = 1200` in the case parameters (mapped to keys `ib` and `num_ibs` in the
params dictionary passed to the validator).

2. Run the validation tooling, e.g. via `./mfc.sh validate case.py` as suggested in
`toolchain/mfc/case_validator.py:_format_errors()` (near the end of the file), which calls
`validate_case_constraints(params, stage='simulation')`.

3. Inside `validate_case_constraints()` (bottom of `toolchain/mfc/case_validator.py`), a
`CaseValidator` is created and `validator.validate('simulation')` is invoked, which in
turn calls `validate_simulation()`, then `validate_common()`, and finally `check_ibm()`
(around lines 679–691).

4. In `check_ibm()`, the condition `ib and (num_ibs <= 0 or num_ibs > 1000)` at line 687
evaluates True for `num_ibs = 1200`, so `self.prohibit()` records the error message
`"num_ibs must be between 1 and num_patches_max (10)"`; when `validate()` raises
`CaseConstraintError`, the user sees this misleading message that claims a maximum of 10
instead of the actual limit 1000.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** toolchain/mfc/case_validator.py
**Line:** 688:688
**Comment:**
	*Logic Error: The validation condition now allows up to 1000 immersed boundaries, but the associated error message still says the maximum is 10, which will confuse users when `num_ibs > 1000` is rejected with an incorrect lower limit; the message should be updated to reflect the new upper bound so that validation feedback matches actual behavior.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 20, 2026

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/simulation/m_ibm.fpp (2)

457-530: 🛠️ Refactor suggestion | 🟠 Major

Use deviceptr instead of private for the pointer variable s_cc.

The documentation specifies that deviceptr is the parameter for pointer variables in GPU regions. Currently, s_cc is declared in the private list despite being a pointer that is reassigned via => to x_cc, y_cc, or z_cc inside the kernel. Replace s_cc in the private list with deviceptr='[s_cc]' to follow the documented GPU macro API, or refactor to eliminate the pointer reassignment entirely by directly indexing the coordinate arrays based on the dim variable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 457 - 530, The GPU kernel macro
invocation lists s_cc as private but s_cc is a pointer reassigned (via =>) to
x_cc, y_cc, or z_cc inside the loop; change the kernel parameter handling to use
deviceptr for this pointer or eliminate the pointer reassignment. Specifically,
update the GPU region (the $:GPU_PARALLEL_LOOP(...) / $:END_GPU_PARALLEL_LOOP
block) so that s_cc is passed as deviceptr='[s_cc]' instead of appearing in the
private list, or refactor the inner loop in the do dim loop to index x_cc, y_cc,
z_cc directly based on dim (removing s_cc and its pointer reassignments) while
keeping the rest of the logic that computes ip_loc, ip_grid, norm, and bound
intact (references: s_cc, x_cc, y_cc, z_cc, ghost_points_in, GPU_PARALLEL_LOOP).

1135-1143: ⚠️ Potential issue | 🟠 Major

Missing deallocation of ghost_points, inner_points, and models in finalize.

ghost_points and inner_points are allocated at lines 129–130 and entered into GPU data at line 132 via GPU_ENTER_DATA(copyin='[ghost_points,inner_points]') in s_ibm_setup, but s_finalize_ibm_module does not clean them up. Similarly, models is allocated at line 73 in s_initialize_ibm_module but never deallocated. This leaks both host and GPU memory.

Proposed fix
  impure subroutine s_finalize_ibm_module()

      @:DEALLOCATE(ib_markers%sf)
+     if (allocated(ghost_points)) then
+         $:GPU_EXIT_DATA(delete='[ghost_points,inner_points]')
+         @:DEALLOCATE(ghost_points)
+         @:DEALLOCATE(inner_points)
+     end if
+     if (allocated(models)) then
+         @:DEALLOCATE(models)
+     end if
      if (allocated(airfoil_grid_u)) then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 1135 - 1143, s_finalize_ibm_module
currently deallocates sf and airfoil grids but misses freeing ghost_points,
inner_points and models allocated earlier; update s_finalize_ibm_module to
deallocate ghost_points, inner_points and models (matching allocations in
s_ibm_setup and s_initialize_ibm_module) and ensure any corresponding GPU data
is exited/released if present (mirror GPU_ENTER_DATA copyin for
ghost_points/inner_points). Locate the allocations for models (in
s_initialize_ibm_module) and ghost_points/inner_points (in s_ibm_setup) and add
corresponding DEALLOCATE calls and GPU exit/cleanup in s_finalize_ibm_module to
prevent host and device memory leaks.
🧹 Nitpick comments (3)
src/simulation/m_ib_patches.fpp (3)

929-931: Loop order in s_ib_3d_model is inconsistent with other 3D kernels and causes non-coalesced GPU writes.

All other 3D patch routines (s_ib_sphere, s_ib_cuboid, s_ib_cylinder) use k outermost / j middle / i innermost so that consecutive vector-lane threads write to contiguous ib_markers%sf(i,j,k) locations (Fortran column-major). This kernel has i outermost / k innermost, meaning adjacent vector lanes stride across the (i,j) plane.

♻️ Proposed fix — reverse loop order to match other 3D kernels
-        do i = il, ir
-            do j = jl, jr
-                do k = kl, kr
+        do k = kl, kr
+            do j = jl, jr
+                do i = il, ir
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 929 - 931, The s_ib_3d_model
kernel currently iterates do i, do j, do k which is inconsistent with other 3D
patch routines and causes non-coalesced GPU writes to ib_markers%sf(i,j,k);
change the loop order to match s_ib_sphere / s_ib_cuboid / s_ib_cylinder by
making k the outermost loop, j the middle, and i the innermost (i.e., iterate k
then j then i) so adjacent vector-lane threads write contiguous memory in
Fortran column-major layout.

814-829: s_ib_ellipse is the only 2D patch still scanning the full grid — add get_bounding_indices narrowing.

Every other 2D/3D primitive (s_ib_circle, s_ib_rectangle, s_ib_airfoil, etc.) was updated to call get_bounding_indices in this PR, but s_ib_ellipse still iterates over the full ghost-extended domain. For the 1000-particle use case this is a clear outlier.

♻️ Proposed narrowing for s_ib_ellipse
+        integer :: il, ir, jl, jr
+        real(wp) :: corner_distance
         ! Transferring the ellipse's centroid and length information
         center(1) = patch_ib(patch_id)%x_centroid
         center(2) = patch_ib(patch_id)%y_centroid
         ellipse_coeffs(1) = 0.5_wp*patch_ib(patch_id)%length_x
         ellipse_coeffs(2) = 0.5_wp*patch_ib(patch_id)%length_y
         inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)

+        il = -gp_layers; ir = m + gp_layers
+        jl = -gp_layers; jr = n + gp_layers
+        corner_distance = sqrt(dot_product(ellipse_coeffs, ellipse_coeffs))
+        call get_bounding_indices(center(1) - corner_distance, center(1) + corner_distance, x_cc, il, ir)
+        call get_bounding_indices(center(2) - corner_distance, center(2) + corner_distance, y_cc, jl, jr)
+
         $:GPU_PARALLEL_LOOP(private='[i,j, xy_local]',&
                   & copyin='[patch_id,center,ellipse_coeffs,inverse_rotation,x_cc,y_cc]', collapse=2)
-        do j = -gp_layers, n + gp_layers
-            do i = -gp_layers, m + gp_layers
+        do j = jl, jr
+            do i = il, ir
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 814 - 829, s_ib_ellipse
currently scans the full ghost-extended grid inside the GPU_PARALLEL_LOOP;
update it to compute a tight loop range using the existing get_bounding_indices
helper (like other primitives) and only iterate i,j over that bounding box
(clamped by the ghost layers gp_layers and domain extents m,n). Specifically,
before the double loop, call get_bounding_indices with the ellipse center and
axes (center, ellipse_coeffs, inverse_rotation) to produce
min_i,max_i,min_j,max_j, then replace the do j/from -gp_layers,n+gp_layers and
do i/from -gp_layers,m+gp_layers ranges with the returned bounded ranges; keep
the same body that computes xy_local (using x_cc,y_cc) and sets
ib_markers%sf(i,j,0)=patch_id when inside the ellipse so behavior and data
references remain unchanged.

62-114: GPU checklist for src/simulation/ changes not confirmed complete.

Per PR objectives the GPU checklist items ("GPU results matching CPU" and "GPU vendor coverage") were listed but not checked. This PR introduces two new GPU kernels (s_ib_model, s_ib_3d_model) and the get_bounding_indices refactor across all IB primitives. Based on learnings: "For GPU code changes in src/simulation/, complete the GPU checklist in the PR template."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 62 - 114, The PR omits
completing the GPU checklist for changes that add GPU kernels—specifically
s_ib_model and s_ib_3d_model—and the get_bounding_indices refactor; run and
record GPU validation by (1) building and running the GPU test suite that covers
all IB primitives including s_ib_model and s_ib_3d_model, (2) comparing GPU
results to CPU baselines (numerical diffs/plots) and attaching the logs, and (3)
confirming vendor coverage by running the same GPU tests on available vendors
(e.g., NVIDIA, AMD, Intel) or noting which vendors were tested and any failures;
update the PR template checklist with pass/fail status, attach the test
commands, output files, and any small fixes needed to make get_bounding_indices
and the new kernels GPU-compatible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/common/m_model.fpp`:
- Line 576: The expression computing fraction uses default real kind; update the
conversion to enforce working precision by replacing real(nInOrOut)/real(spc)
with real(nInOrOut, wp)/real(spc, wp) and apply the same change in the
f_model_is_inside_flat occurrence (line referencing spc and nInOrOut) so both
numerator and denominator use the wp kind consistently.
- Around line 541-543: The hash constants are written with the `_wp` suffix
(e.g. 73856093_wp) which creates real literals that will be rounded in
single-precision builds; change them to use explicit conversion from integer to
the working kind, e.g. replace occurrences like int(point(1)*73856093_wp) with
int(point(1)*real(73856093, wp)) (or equivalently real(73856093, wp) wrapped as
needed) in the rand_seed computation in m_model (the lines setting rand_seed)
and make the identical change in f_model_is_inside_flat (the three constants on
lines ~599–601) so the integer constants are converted explicitly to kind wp
instead of being written as real literals.

In `@src/simulation/m_compute_levelset.fpp`:
- Line 694: The conditional uses an integer "interpolate" as a logical; change
the check at the if statement (currently "if (interpolate)") to an explicit
integer comparison "if (interpolate == 1)" to match the other usage in this
routine and standard Fortran; update the same conditional in routine
m_compute_levelset (variable interpolate) so all branches use "interpolate == 1"
consistently.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 941-949: s_ib_3d_model is dereferencing
patch_ib(patch_id)%model_threshold inside the GPU kernel even though you already
copied the scalar threshold into the kernel; change the kernel check to use the
local scalar threshold (the scalar set at line ~910 and listed in copyin)
instead of accessing patch_ib(patch_id)%model_threshold so the kernel uses the
device-local scalar and avoids a possible host-to-device runtime fault — update
the conditional around the call to f_model_is_inside_flat and the assignment to
ib_markers%sf(i,j,k) to compare eta against threshold rather than patch_ib(...).
- Around line 428-430: The z-direction bounding call to get_bounding_indices
currently reuses ca_in and thus can exclude wing-tip cells; change the third
call so the z bounds use the correct half-span (use max(ca_in, lz/2) or at least
lz/2) instead of ca_in by passing center(3) - z_bound and center(3) + z_bound
with z_bound = max(ca_in, lz/2); update the call to
get_bounding_indices(center(3) - z_bound, center(3) + z_bound, z_cc, ll, lr) so
ll/lr cover the full airfoil span when marking IB cells.
- Around line 1257-1265: Add an explicit runtime check at the start of
subroutine get_bounding_indices to enforce the invariant that buff_size >=
gp_layers: validate the declared dummy array dimension(-buff_size:) against the
module/global gp_layers constant and, if the check fails, abort with a clear
error (e.g., error stop or process-level abort) so callers that set left_index =
-gp_layers cannot cause out-of-bounds access on cell_centers; place this check
immediately after the subroutine header and before using left_index/itr_left.

In `@src/simulation/m_ibm.fpp`:
- Around line 596-673: The GPU loop leaves ghost_points_in(local_idx)%DB(3)
uninitialized when p == 0; fix by explicitly initializing the third component of
the DB array for ghost points inside the is_gp branch: after setting DB(1) and
DB(2) (within the block guarded by is_gp) add logic that sets
ghost_points_in(local_idx)%DB(3) = 0 when p == 0 (or alternatively always
initialize DB(3) before the conditional on p), referencing the symbols
ghost_points_in, local_idx, p and DB so the change is placed in the same
GPU_PARALLEL_LOOP where is_gp is determined.

In `@src/simulation/m_time_steppers.fpp`:
- Line 806: The NVTX range name contains a typo: update the nvtxStartRange call
that currently uses "PROPAGATE-IMERSED-BOUNDARIES" (the call
nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")) to the correct label
"PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools show the expected name.

---

Outside diff comments:
In `@src/simulation/m_ibm.fpp`:
- Around line 457-530: The GPU kernel macro invocation lists s_cc as private but
s_cc is a pointer reassigned (via =>) to x_cc, y_cc, or z_cc inside the loop;
change the kernel parameter handling to use deviceptr for this pointer or
eliminate the pointer reassignment. Specifically, update the GPU region (the
$:GPU_PARALLEL_LOOP(...) / $:END_GPU_PARALLEL_LOOP block) so that s_cc is passed
as deviceptr='[s_cc]' instead of appearing in the private list, or refactor the
inner loop in the do dim loop to index x_cc, y_cc, z_cc directly based on dim
(removing s_cc and its pointer reassignments) while keeping the rest of the
logic that computes ip_loc, ip_grid, norm, and bound intact (references: s_cc,
x_cc, y_cc, z_cc, ghost_points_in, GPU_PARALLEL_LOOP).
- Around line 1135-1143: s_finalize_ibm_module currently deallocates sf and
airfoil grids but misses freeing ghost_points, inner_points and models allocated
earlier; update s_finalize_ibm_module to deallocate ghost_points, inner_points
and models (matching allocations in s_ibm_setup and s_initialize_ibm_module) and
ensure any corresponding GPU data is exited/released if present (mirror
GPU_ENTER_DATA copyin for ghost_points/inner_points). Locate the allocations for
models (in s_initialize_ibm_module) and ghost_points/inner_points (in
s_ibm_setup) and add corresponding DEALLOCATE calls and GPU exit/cleanup in
s_finalize_ibm_module to prevent host and device memory leaks.

---

Duplicate comments:
In `@src/common/m_model.fpp`:
- Around line 1394-1407: The subroutine s_pack_model_for_gpu unconditionally
allocates ma%trs_v and ma%trs_n which will crash if called again on the same
object; update s_pack_model_for_gpu to check allocated(ma%trs_v) and
allocated(ma%trs_n) and deallocate them (or use move_alloc/resizing strategy)
before allocating new arrays, ensuring ma%ntrs is assigned after any
deallocation and that allocations match ma%model%ntrs; reference ma%trs_v,
ma%trs_n, ma%ntrs, and s_pack_model_for_gpu when making the change.
- Around line 21-42: The public GPU-flat arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate,
gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes) and the
models(t_model_array) are not being deallocated; update the module finalization
path by implementing/augmenting s_model_free to safely deallocate each
allocatable (using allocated() checks) and nullify targets as needed, and ensure
s_model_free is called on teardown (or register a module final subroutine) so
all GPU/host memory is released.
- Around line 1203-1261: The subroutine f_distance_normals_3D_flat can leave
tri_idx at 0 when ntrs == 0, causing invalid access to trs_n(:,tri_idx,pid); add
an early guard at the top of the routine: if ntrs == 0 then set normals(1:3) =
0._wp (or another agreed sentinel) and distance = 1.e12_wp (matching dist_min)
and return; this prevents the later accesses to trs_n(1,tri_idx,pid),
trs_n(2,tri_idx,pid), trs_n(3,tri_idx,pid) from using tri_idx == 0.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 1120-1143: The module-level gpu_* arrays (gpu_ntrs, gpu_trs_v,
gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices,
gpu_boundary_v, gpu_interpolated_boundary_v) are being allocated in
s_instantiate_STL_models without allocation guards, causing double-allocation on
restart; modify s_instantiate_STL_models to check allocation status before
allocating (e.g., if (.not. allocated(gpu_ntrs)) then allocate(...) ) or
explicitly deallocate before re-allocating, applying this guard to every
allocate call for the listed symbols and for the conditional allocations of
gpu_boundary_v and gpu_interpolated_boundary_v so repeated calls no longer
error.
- Around line 857-884: The kernel currently does a full-grid scan (loops over
i=-gp_layers..m+gp_layers and j=-gp_layers..n+gp_layers) instead of using
get_bounding_indices like other STL paths, and it omits gpu arrays from the GPU
data clauses; fix by computing narrowed loop bounds via get_bounding_indices for
the patch before the GPU_PARALLEL_LOOP and iterate only over those indices
(replace the full-range i/j loops with the bounding ranges), and add gpu_ntrs,
gpu_trs_v, and gpu_trs_n to the GPU data clause (copyin/present) of the
GPU_PARALLEL_LOOP so f_model_is_inside_flat can safely access them while keeping
ib_markers%sf writes as before.
- Around line 1074-1090: The code allocates stl_bounding_boxes inside the
per-patch loop causing reallocation errors and conditionally sets
models(patch_id)%total_vertices only when interpolate is true even though
interpolate is forced false earlier; also interpolation calls pass full dx/dy/dz
arrays instead of scalar local spacings. Fix by moving the
allocate(stl_bounding_boxes(...)) so it happens once before the patch_id loop
(or guard it with allocated(...) check) to avoid reallocating, ensure
models(patch_id)%total_vertices is assigned unconditionally (or set
models(patch_id)%total_vertices = 0 when not interpolating) and only set
models(patch_id)%interpolated_boundary_v when interpolate is true, and change
calls that currently pass (/dx, dy, dz/) to pass the scalar local spacings
(/dx_local, dy_local, dz_local/) (look for symbols stl_bounding_boxes, models(%
)%total_vertices, interpolate, interpolated_boundary_v, and the interpolation
routine call sites using dx/dy/dz).

---

Nitpick comments:
In `@src/simulation/m_ib_patches.fpp`:
- Around line 929-931: The s_ib_3d_model kernel currently iterates do i, do j,
do k which is inconsistent with other 3D patch routines and causes non-coalesced
GPU writes to ib_markers%sf(i,j,k); change the loop order to match s_ib_sphere /
s_ib_cuboid / s_ib_cylinder by making k the outermost loop, j the middle, and i
the innermost (i.e., iterate k then j then i) so adjacent vector-lane threads
write contiguous memory in Fortran column-major layout.
- Around line 814-829: s_ib_ellipse currently scans the full ghost-extended grid
inside the GPU_PARALLEL_LOOP; update it to compute a tight loop range using the
existing get_bounding_indices helper (like other primitives) and only iterate
i,j over that bounding box (clamped by the ghost layers gp_layers and domain
extents m,n). Specifically, before the double loop, call get_bounding_indices
with the ellipse center and axes (center, ellipse_coeffs, inverse_rotation) to
produce min_i,max_i,min_j,max_j, then replace the do j/from
-gp_layers,n+gp_layers and do i/from -gp_layers,m+gp_layers ranges with the
returned bounded ranges; keep the same body that computes xy_local (using
x_cc,y_cc) and sets ib_markers%sf(i,j,0)=patch_id when inside the ellipse so
behavior and data references remain unchanged.
- Around line 62-114: The PR omits completing the GPU checklist for changes that
add GPU kernels—specifically s_ib_model and s_ib_3d_model—and the
get_bounding_indices refactor; run and record GPU validation by (1) building and
running the GPU test suite that covers all IB primitives including s_ib_model
and s_ib_3d_model, (2) comparing GPU results to CPU baselines (numerical
diffs/plots) and attaching the logs, and (3) confirming vendor coverage by
running the same GPU tests on available vendors (e.g., NVIDIA, AMD, Intel) or
noting which vendors were tested and any failures; update the PR template
checklist with pass/fail status, attach the test commands, output files, and any
small fixes needed to make get_bounding_indices and the new kernels
GPU-compatible.

Comment on lines +541 to +543
rand_seed = int(point(1)*73856093_wp) + &
int(point(2)*19349663_wp) + &
int(point(3)*83492791_wp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lint failure: _wp suffix on integer hash constants loses precision in single-precision builds.

The pipeline flags 73856093_wp, 19349663_wp, 83492791_wp. Using _wp makes these real literals, but these primes exceed the single-precision mantissa (~8.4M), so they'll be silently truncated when wp is single precision. Use explicit conversion instead.

Proposed fix
-        rand_seed = int(point(1)*73856093_wp) + &
-                    int(point(2)*19349663_wp) + &
-                    int(point(3)*83492791_wp)
+        rand_seed = int(point(1)*real(73856093, wp)) + &
+                    int(point(2)*real(19349663, wp)) + &
+                    int(point(3)*real(83492791, wp))

Apply the same fix to lines 599–601 in f_model_is_inside_flat.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rand_seed = int(point(1)*73856093_wp) + &
int(point(2)*19349663_wp) + &
int(point(3)*83492791_wp)
rand_seed = int(point(1)*real(73856093, wp)) + &
int(point(2)*real(19349663, wp)) + &
int(point(3)*real(83492791, wp))
🧰 Tools
🪛 GitHub Actions: Lint Source

[error] 541-541: Pattern match found: '73856093_wp'


[error] 542-542: Pattern match found: '19349663_wp'


[error] 543-543: Pattern match found: '83492791_wp'

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 541 - 543, The hash constants are
written with the `_wp` suffix (e.g. 73856093_wp) which creates real literals
that will be rounded in single-precision builds; change them to use explicit
conversion from integer to the working kind, e.g. replace occurrences like
int(point(1)*73856093_wp) with int(point(1)*real(73856093, wp)) (or equivalently
real(73856093, wp) wrapped as needed) in the rand_seed computation in m_model
(the lines setting rand_seed) and make the identical change in
f_model_is_inside_flat (the three constants on lines ~599–601) so the integer
constants are converted explicitly to kind wp instead of being written as real
literals.

nInOrOut = nInOrOut + mod(nHits, 2)
end do

fraction = real(nInOrOut)/real(spc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

real() without kind parameter — use real(..., wp) for precision consistency.

real(nInOrOut)/real(spc) uses default real kind, which may differ from wp. Same issue at line 629 in f_model_is_inside_flat. As per coding guidelines: avoid implicit precision; use real(nInOrOut, wp)/real(spc, wp).

Proposed fix (line 576)
-        fraction = real(nInOrOut)/real(spc)
+        fraction = real(nInOrOut, wp)/real(spc, wp)
Proposed fix (line 629)
-        fraction = real(nInOrOut)/real(spc)
+        fraction = real(nInOrOut, wp)/real(spc, wp)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fraction = real(nInOrOut)/real(spc)
fraction = real(nInOrOut, wp)/real(spc, wp)
Suggested change
fraction = real(nInOrOut)/real(spc)
fraction = real(nInOrOut, wp)/real(spc, wp)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` at line 576, The expression computing fraction uses
default real kind; update the conversion to enforce working precision by
replacing real(nInOrOut)/real(spc) with real(nInOrOut, wp)/real(spc, wp) and
apply the same change in the f_model_is_inside_flat occurrence (line referencing
spc and nInOrOut) so both numerator and denominator use the wp kind
consistently.

call f_distance_normals_3D_flat(gpu_ntrs(patch_id), gpu_trs_v, gpu_trs_n, patch_id, xyz_local, normals, distance)

! Get the shortest distance between the cell center and the interpolated model boundary
if (interpolate) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, examine the relevant lines in m_compute_levelset.fpp
cat -n src/simulation/m_compute_levelset.fpp | sed -n '645,660p'

Repository: MFlowCode/MFC

Length of output: 677


🏁 Script executed:

# Check lines 690-710 to see both occurrences
cat -n src/simulation/m_compute_levelset.fpp | sed -n '690,720p'

Repository: MFlowCode/MFC

Length of output: 1784


🏁 Script executed:

# Search for similar integer-as-logical patterns in the codebase
rg -nP 'if\s*\(\s*(?!.*==|.*\.or\.|.*\.and\.).*(?:interpolate|gpu_interpolate|gpu_ntrs|gpu_boundary_edge_count)\s*\)' src/simulation/ src/common/ --max-count=30

Repository: MFlowCode/MFC

Length of output: 357


🏁 Script executed:

# Check the context of similar patterns in m_ib_patches.fpp
cat -n src/simulation/m_ib_patches.fpp | sed -n '1035,1045p'

Repository: MFlowCode/MFC

Length of output: 663


🏁 Script executed:

cat -n src/simulation/m_ib_patches.fpp | sed -n '1080,1095p'

Repository: MFlowCode/MFC

Length of output: 804


🏁 Script executed:

# Also verify if interpolate is declared as integer in m_ib_patches.fpp
rg -n 'integer.*interpolate|integer :: .*interpolate' src/simulation/m_ib_patches.fpp | head -5

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

# Search for interpolate declaration in m_ib_patches.fpp context
rg -B5 -A5 'if (interpolate)' src/simulation/m_ib_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

# Check the subroutine signature and local declarations in m_ib_patches.fpp
rg -n 'subroutine.*m_ib_patches|logical.*interpolate|integer.*interpolate' src/simulation/m_ib_patches.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 172


Integer variable used as logical condition; change if (interpolate) to if (interpolate == 1).

The variable interpolate is declared as integer (line 650), but line 694 treats it as a logical condition, which is non-standard Fortran. Line 707 in the same file correctly uses if (interpolate == 1), creating an inconsistency within the same routine.

Proposed fix
-            if (interpolate) then
+            if (interpolate == 1) then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (interpolate) then
if (interpolate == 1) then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_compute_levelset.fpp` at line 694, The conditional uses an
integer "interpolate" as a logical; change the check at the if statement
(currently "if (interpolate)") to an explicit integer comparison "if
(interpolate == 1)" to match the other usage in this routine and standard
Fortran; update the same conditional in routine m_compute_levelset (variable
interpolate) so all branches use "interpolate == 1" consistently.

Comment on lines +428 to +430
call get_bounding_indices(center(1) - ca_in, center(1) + ca_in, x_cc, il, ir)
call get_bounding_indices(center(2) - ca_in, center(2) + ca_in, y_cc, jl, jr)
call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: z-direction bounding uses chord ca_in instead of half-span lz/2, silently excluding wing-tip cells.

Lines 428–429 correctly use ca_in for the x/y (chord-section) bounds, but line 430 reuses ca_in for z even though the airfoil spans lz in that direction (z_max = lz/2, z_min = -lz/2, lines 355–356). For any realistic wing where lz > 2*ca_in, the loop over l never reaches cells with |z_cc(l) - center(3)| > ca_in, so those cells are never tested against the airfoil geometry and remain unmarked — silently corrupting the IB marker field.

🐛 Proposed fix
-        call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr)
+        call get_bounding_indices(center(3) - lz/2._wp, center(3) + lz/2._wp, z_cc, ll, lr)

For a rotated airfoil the tightest safe bound is max(ca_in, lz/2) per axis; the above is correct for the unrotated (common) case and consistent with how z_min/z_max are defined.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
call get_bounding_indices(center(1) - ca_in, center(1) + ca_in, x_cc, il, ir)
call get_bounding_indices(center(2) - ca_in, center(2) + ca_in, y_cc, jl, jr)
call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr)
call get_bounding_indices(center(1) - ca_in, center(1) + ca_in, x_cc, il, ir)
call get_bounding_indices(center(2) - ca_in, center(2) + ca_in, y_cc, jl, jr)
call get_bounding_indices(center(3) - lz/2._wp, center(3) + lz/2._wp, z_cc, ll, lr)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 428 - 430, The z-direction
bounding call to get_bounding_indices currently reuses ca_in and thus can
exclude wing-tip cells; change the third call so the z bounds use the correct
half-span (use max(ca_in, lz/2) or at least lz/2) instead of ca_in by passing
center(3) - z_bound and center(3) + z_bound with z_bound = max(ca_in, lz/2);
update the call to get_bounding_indices(center(3) - z_bound, center(3) +
z_bound, z_cc, ll, lr) so ll/lr cover the full airfoil span when marking IB
cells.

Comment on lines +941 to +949
eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
gpu_trs_v, gpu_trs_n, &
patch_id, &
xyz_local, (/dx(i), dy(j), dz(k)/), &
spc)

! Reading STL boundary vertices and compute the levelset and levelset_norm
if (eta > patch_ib(patch_id)%model_threshold) then
ib_markers_sf(i, j, k) = patch_id
ib_markers%sf(i, j, k) = patch_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

s_ib_3d_model reads patch_ib(patch_id)%model_threshold inside the GPU kernel instead of the scalar threshold already copied in.

Line 910 captures threshold = patch_ib(patch_id)%model_threshold and line 928 lists it in copyin, yet line 948 discards it and dereferences the derived-type field again inside the kernel. s_ib_model (line 877) uses the local scalar correctly. Beyond inconsistency, if patch_ib is not independently present on the device this is a runtime fault.

🐛 Proposed fix
-                    if (eta > patch_ib(patch_id)%model_threshold) then
+                    if (eta > threshold) then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 941 - 949, s_ib_3d_model is
dereferencing patch_ib(patch_id)%model_threshold inside the GPU kernel even
though you already copied the scalar threshold into the kernel; change the
kernel check to use the local scalar threshold (the scalar set at line ~910 and
listed in copyin) instead of accessing patch_ib(patch_id)%model_threshold so the
kernel uses the device-local scalar and avoids a possible host-to-device runtime
fault — update the conditional around the call to f_model_is_inside_flat and the
assignment to ib_markers%sf(i,j,k) to compare eta against threshold rather than
patch_ib(...).

Comment on lines +1257 to +1265
subroutine get_bounding_indices(left_bound, right_bound, cell_centers, left_index, right_index)

real(wp), intent(in) :: left_bound, right_bound
integer, intent(inout) :: left_index, right_index
real(wp), dimension(-buff_size:), intent(in) :: cell_centers

integer :: itr_left, itr_middle, itr_right

itr_left = left_index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify buff_size and gp_layers are always equal in simulation code
echo "=== buff_size definition ==="
rg -n 'buff_size\s*=' src/simulation/ src/common/ --type fpp -B1 -A1 | head -30

echo "=== gp_layers definition/assignment ==="
rg -n 'gp_layers\s*=' src/simulation/ src/common/ --type fpp -B1 -A1 | head -30

echo "=== get_bounding_indices call sites ==="
rg -n 'call get_bounding_indices' src/simulation/ --type fpp -B2 -A1 | head -40

echo "=== cell_centers actual array declarations (x_cc, y_cc, z_cc) ==="
rg -nP 'x_cc|y_cc|z_cc' src/simulation/m_global_parameters.fpp src/common/m_global_parameters.fpp 2>/dev/null | grep -i 'allocat\|dimension' | head -20

Repository: MFlowCode/MFC

Length of output: 970


🏁 Script executed:

#!/bin/bash
# Search without --type flag
echo "=== buff_size definition/use ==="
rg -n 'buff_size' src/simulation/m_ib_patches.fpp src/simulation/m_global_parameters.fpp | head -40

echo "=== gp_layers definition/use ==="
rg -n 'gp_layers' src/simulation/m_ib_patches.fpp src/simulation/m_global_parameters.fpp | head -40

echo "=== Check m_ib_patches imports and parameter section ==="
head -100 src/simulation/m_ib_patches.fpp | grep -A 20 'subroutine\|parameter\|buff_size\|gp_layers'

echo "=== get_bounding_indices all calls with context ==="
rg -n 'get_bounding_indices' src/simulation/ -B3 -A1 | head -80

Repository: MFlowCode/MFC

Length of output: 12541


🏁 Script executed:

#!/bin/bash
# Find gp_layers definition/declaration
echo "=== gp_layers parameter/variable declaration ==="
rg -n 'integer.*gp_layers|gp_layers.*integer' src/simulation/ src/common/ -B2 -A2

echo "=== Search for parameter declarations in m_ib_patches ==="
head -200 src/simulation/m_ib_patches.fpp | grep -i 'parameter\|gp_layers'

echo "=== Search for gp_layers in m_global_parameters ==="
rg -n 'gp_layers' src/simulation/m_global_parameters.fpp | head -20

echo "=== Check if buff_size and gp_layers are related ==="
rg -n 'buff_size|gp_layers' src/simulation/m_global_parameters.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 2975


🏁 Script executed:

#!/bin/bash
# Find where buff_size is assigned
echo "=== buff_size assignments ==="
rg -n 'buff_size\s*=' src/ -B2 -A2 | head -80

echo "=== Check if buff_size is read from input or derived ==="
rg -n 'buff_size' src/simulation/m_global_parameters.fpp | grep -E 'read|input|set|=|parameter'

echo "=== Full context of buff_size variable declaration and related code ==="
sed -n '335,360p' src/simulation/m_global_parameters.fpp

Repository: MFlowCode/MFC

Length of output: 3661


🏁 Script executed:

#!/bin/bash
# Verify there's no explicit documentation or assertion enforcing buff_size >= gp_layers
echo "=== Check for assertions or guards on buff_size >= gp_layers ==="
rg -n 'buff_size.*gp_layers|gp_layers.*buff_size' src/ --type-not test

echo "=== Check minimum values in buff_size calculation path ==="
sed -n '125,155p' src/common/m_helper_basic.fpp

echo "=== Verify IBM flag triggers buff_size >= 10 ==="
rg -n 'if.*ib.*then' src/common/m_helper_basic.fpp -A3 | head -20

Repository: MFlowCode/MFC

Length of output: 1463


Add an assertion to enforce buff_size ≥ gp_layers or document the implicit invariant.

The dummy array cell_centers is declared dimension(-buff_size:) (line 1261), and all callers initialize left_index = −gp_layers (−3). While the code is currently safe because get_bounding_indices is only called from IBM routines where buff_size ≥ 10, there is no explicit guard preventing out-of-bounds access if the invariant is violated. Consider adding an assertion at the subroutine entry point:

if (buff_size < gp_layers) then
    error stop 'get_bounding_indices: buff_size must be >= gp_layers'
end if

This makes the coupling between buff_size and gp_layers explicit and prevents latent bugs if the code is refactored or called from different contexts in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 1257 - 1265, Add an explicit
runtime check at the start of subroutine get_bounding_indices to enforce the
invariant that buff_size >= gp_layers: validate the declared dummy array
dimension(-buff_size:) against the module/global gp_layers constant and, if the
check fails, abort with a clear error (e.g., error stop or process-level abort)
so callers that set left_index = -gp_layers cannot cause out-of-bounds access on
cell_centers; place this check immediately after the subroutine header and
before using left_index/itr_left.

Comment on lines +596 to +673
count = 0
count_i = 0
gp_layers_z = gp_layers
if (p == 0) gp_layers_z = 0

$:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
do i = 0, m
do j = 0, n
if (p == 0) then
! 2D
if (ib_markers%sf(i, j, 0) /= 0) then
subsection_2D = ib_markers%sf( &
i - gp_layers:i + gp_layers, &
j - gp_layers:j + gp_layers, 0)
if (any(subsection_2D == 0)) then
ghost_points_in(count)%loc = [i, j, 0]
patch_id = ib_markers%sf(i, j, 0)
ghost_points_in(count)%ib_patch_id = &
patch_id

ghost_points_in(count)%slip = patch_ib(patch_id)%slip
! ghost_points(count)%rank = proc_rank
do k = 0, p
if (ib_markers%sf(i, j, k) /= 0) then
is_gp = .false.
marker_search: do ii = i - gp_layers, i + gp_layers
do jj = j - gp_layers, j + gp_layers
do kk = k - gp_layers_z, k + gp_layers_z
if (ib_markers%sf(ii, jj, kk) == 0) then
! if any neighbors are not in the IB, it is a ghost point
is_gp = .true.
exit marker_search
end if
end do
end do
end do marker_search

if (is_gp) then
$:GPU_ATOMIC(atomic='capture')
count = count + 1
local_idx = count
$:END_GPU_ATOMIC_CAPTURE()

ghost_points_in(local_idx)%loc = [i, j, k]
patch_id = ib_markers%sf(i, j, k)
ghost_points_in(local_idx)%ib_patch_id = patch_id

ghost_points_in(local_idx)%slip = patch_ib(patch_id)%slip

if ((x_cc(i) - dx(i)) < x_domain%beg) then
ghost_points_in(count)%DB(1) = -1
ghost_points_in(local_idx)%DB(1) = -1
else if ((x_cc(i) + dx(i)) > x_domain%end) then
ghost_points_in(count)%DB(1) = 1
ghost_points_in(local_idx)%DB(1) = 1
else
ghost_points_in(count)%DB(1) = 0
ghost_points_in(local_idx)%DB(1) = 0
end if

if ((y_cc(j) - dy(j)) < y_domain%beg) then
ghost_points_in(count)%DB(2) = -1
ghost_points_in(local_idx)%DB(2) = -1
else if ((y_cc(j) + dy(j)) > y_domain%end) then
ghost_points_in(count)%DB(2) = 1
ghost_points_in(local_idx)%DB(2) = 1
else
ghost_points_in(count)%DB(2) = 0
ghost_points_in(local_idx)%DB(2) = 0
end if

count = count + 1

else
inner_points_in(count_i)%loc = [i, j, 0]
patch_id = ib_markers%sf(i, j, 0)
inner_points_in(count_i)%ib_patch_id = &
patch_id
inner_points_in(count_i)%slip = patch_ib(patch_id)%slip
count_i = count_i + 1

end if
end if
else
! 3D
do k = 0, p
if (ib_markers%sf(i, j, k) /= 0) then
subsection_3D = ib_markers%sf( &
i - gp_layers:i + gp_layers, &
j - gp_layers:j + gp_layers, &
k - gp_layers:k + gp_layers)
if (any(subsection_3D == 0)) then
ghost_points_in(count)%loc = [i, j, k]
patch_id = ib_markers%sf(i, j, k)
ghost_points_in(count)%ib_patch_id = &
ib_markers%sf(i, j, k)
ghost_points_in(count)%slip = patch_ib(patch_id)%slip

if ((x_cc(i) - dx(i)) < x_domain%beg) then
ghost_points_in(count)%DB(1) = -1
else if ((x_cc(i) + dx(i)) > x_domain%end) then
ghost_points_in(count)%DB(1) = 1
else
ghost_points_in(count)%DB(1) = 0
end if

if ((y_cc(j) - dy(j)) < y_domain%beg) then
ghost_points_in(count)%DB(2) = -1
else if ((y_cc(j) + dy(j)) > y_domain%end) then
ghost_points_in(count)%DB(2) = 1
else
ghost_points_in(count)%DB(2) = 0
end if

if (p /= 0) then
if ((z_cc(k) - dz(k)) < z_domain%beg) then
ghost_points_in(count)%DB(3) = -1
ghost_points_in(local_idx)%DB(3) = -1
else if ((z_cc(k) + dz(k)) > z_domain%end) then
ghost_points_in(count)%DB(3) = 1
ghost_points_in(local_idx)%DB(3) = 1
else
ghost_points_in(count)%DB(3) = 0
ghost_points_in(local_idx)%DB(3) = 0
end if
end if

count = count + 1
else
inner_points_in(count_i)%loc = [i, j, k]
patch_id = ib_markers%sf(i, j, k)
inner_points_in(count_i)%ib_patch_id = &
ib_markers%sf(i, j, k)
inner_points_in(count_i)%slip = patch_ib(patch_id)%slip
else
$:GPU_ATOMIC(atomic='capture')
count_i = count_i + 1
local_idx = count_i
$:END_GPU_ATOMIC_CAPTURE()

inner_points_in(local_idx)%loc = [i, j, k]
patch_id = ib_markers%sf(i, j, k)
inner_points_in(local_idx)%ib_patch_id = patch_id
inner_points_in(local_idx)%slip = patch_ib(patch_id)%slip

count_i = count_i + 1
end if
end if
end do
end if
end if
end do
end do
end do
$:END_GPU_PARALLEL_LOOP()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ghost_points_in(local_idx)%DB(3) uninitialized when p == 0.

In the 2D case (p == 0), the if (p /= 0) block (lines 647–655) is skipped, leaving DB(3) uninitialized for ghost points. While DB(3) may not be used in the 2D path of s_compute_image_points (since num_dims limits the loop), uninitialized data in a GPU-resident struct can cause subtle issues if any downstream code reads it.

Proposed fix
+                            ghost_points_in(local_idx)%DB(3) = 0
                             if (p /= 0) then
                                 if ((z_cc(k) - dz(k)) < z_domain%beg) then
                                     ghost_points_in(local_idx)%DB(3) = -1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 596 - 673, The GPU loop leaves
ghost_points_in(local_idx)%DB(3) uninitialized when p == 0; fix by explicitly
initializing the third component of the DB array for ghost points inside the
is_gp branch: after setting DB(1) and DB(2) (within the block guarded by is_gp)
add logic that sets ghost_points_in(local_idx)%DB(3) = 0 when p == 0 (or
alternatively always initialize DB(3) before the conditional on p), referencing
the symbols ghost_points_in, local_idx, p and DB so the change is placed in the
same GPU_PARALLEL_LOOP where is_gp is determined.

integer :: i
logical :: forces_computed

call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in NVTX range name: "IMERSED" → "IMMERSED".

This label appears in profiling tools (e.g., Nsight Systems). The PR description images already show "PROPAGATE-IMMERSED-BOUNDARIES" as the expected label.

Proposed fix
-        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
+        call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_time_steppers.fpp` at line 806, The NVTX range name contains
a typo: update the nvtxStartRange call that currently uses
"PROPAGATE-IMERSED-BOUNDARIES" (the call
nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")) to the correct label
"PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools show the expected name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

1 participant

Comments